[
https://issues.apache.org/jira/browse/HDFS-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998586#comment-15998586
]
James Clampffer commented on HDFS-10785:
----------------------------------------
Thanks for updating this [~anatoli.shein]!
Comments:
-Looks like the tools_common getOptions functions pulls from hdfs-site.xml and
core-site.xml. Can you also make it allow the environment's HADOOP_CONF_DIR
override those?
-hdfs_get and hdfs_copyToLocal are really the same thing other than the help
message. Is there any reason to have two? hdfs_moveToLocal is also the same
thing other than doing the delete at the end. I think it'd be straightforward
to factor out the main copy loop into a function and call that from all 3; this
would make things a lot more maintainable.
-Is it worth making hdfs_ls have the option to be recursive? Seems like we
could get the same functionality with hdfs_find, can we share more code between
the two?
-How about moving the printStatInfo function in tools_common.h to be a .str()
method on the StatInfo object in include/statinfo.h. That looks like it'd be
useful for general debugging and logging in client applications as well. Same
with printFsInfo, printContentSummary etc.
-in hdfs_copyToLocal you don't need the break after the call to exit. Looks
like the same issue is in other tools. It'd be nice to generalize the options
parsing so you don't have that block of code in all the tools, not sure if
there's a good way to do it though.
{code}
usage();
exit(EXIT_SUCCESS);
break;
{code}
-also in hdfs_copyToLocal either check that hdfs::Options isn't empty and call
exit with a warning or use a default one (seems reasonable). Accessing
optionals that haven't been set will throw some error but it's going to be
something confusing for end users.
{code}
//TODO: HDFS-9539 Currently options can be returned empty
hdfs::Options options = *hdfs::getOptions();
{code}
-More generally it seems like it'd make sense to stick the config parsing,
timeout limit changes, and FileSystem creation into a helper function in
tools_common so things can be changed in a single place.
{code}
//TODO: HDFS-9539 Currently options can be returned empty
hdfs::Options options = *hdfs::getOptions();
//TODO: HDFS-9539 - until then we increase the time-out to allow all
recursive async calls to finish
options.rpc_timeout = std::numeric_limits<int>::max();
std::shared_ptr<hdfs::FileSystem> fs = hdfs::doConnect(uri.value(),
options);
if (!fs) {
std::cerr << "Could not connect the file system. " << std::endl;
exit(EXIT_FAILURE);
}
{code}
-also in hdfs_copyToLocal: I'd increase BUF_SIZE to something a lot larger than
4096. We still don't have caching for DN connections so copying a large file
is going to have a lot of overhead just setting up connections to the DN.
Consider moving it to a global variable so it can live in bss/data rather than
the stack; valgrind doesn't pick up on stack corruption well and it's possible
(though unlikely) that someone will set the system's max stack size to be
pretty small.
-would be really nice if hdfs_tail let you specify how many lines to print.
Still helpful and a lot less work would be specifying how many bytes to print.
-IMHO nitpicking coding style stuff to death during reviews is usually an
exercise in bikeshedding, but could you make sure tab spacing is consistent?
For example in hdfs_chmod.cpp you switch between 2 and 4 spaces. (turning .cpp
to .cc extensions for consistency would be nice too)
{code}
auto handlerFind = [fs, state](const hdfs::Status &status_find, const
std::vector<hdfs::StatInfo> & stat_infos, bool has_more_results) -> bool {
//For each result returned by Find we call async SetPermission with the
handler below.
//SetPermission DOES NOT guarantee that the handler will only be called
once at a time, so we DO need locking in handlerSetPermission.
auto handlerSetPermission = [state](const hdfs::Status
&status_set_permission) {
{code}
Have you tested all of these tools using valgrind and large directory trees?
We are in a bad state right now since the minidfscluster tests can't be run
under valgrind (JIT compilation + GC blow things up) so memory bugs keep
slipping in. I'm going to file a jira to fork the process before starting the
minidfscluster so we can start getting coverage but I'm not sure I'll get to it
for a couple weeks.
> libhdfs++: Implement the rest of the tools
> ------------------------------------------
>
> Key: HDFS-10785
> URL: https://issues.apache.org/jira/browse/HDFS-10785
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: Anatoli Shein
> Assignee: Anatoli Shein
> Attachments: HDFS-10785.HDFS-8707.000.patch,
> HDFS-10785.HDFS-8707.001.patch, HDFS-10785.HDFS-8707.002.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]