[ 
https://issues.apache.org/jira/browse/HDFS-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998586#comment-15998586
 ] 

James Clampffer commented on HDFS-10785:
----------------------------------------

Thanks for updating this [~anatoli.shein]!

Comments:
-Looks like the tools_common getOptions functions pulls from hdfs-site.xml and 
core-site.xml.  Can you also make it allow the environment's HADOOP_CONF_DIR 
override those?

-hdfs_get and hdfs_copyToLocal are really the same thing other than the help 
message.  Is there any reason to have two?  hdfs_moveToLocal is also the same 
thing other than doing the delete at the end.  I think it'd be straightforward 
to factor out the main copy loop into a function and call that from all 3; this 
would make things a lot more maintainable.

-Is it worth making hdfs_ls have the option to be recursive?  Seems like we 
could get the same functionality with hdfs_find, can we share more code between 
the two?

-How about moving the printStatInfo function in tools_common.h to be a .str() 
method on the StatInfo object in include/statinfo.h.  That looks like it'd be 
useful for general debugging and logging in client applications as well.  Same 
with printFsInfo, printContentSummary etc.

-in hdfs_copyToLocal you don't need the break after the call to exit.  Looks 
like the same issue is in other tools.  It'd be nice to generalize the options 
parsing so you don't have that block of code in all the tools, not sure if 
there's a good way to do it though.
{code}
      usage();
      exit(EXIT_SUCCESS);
      break;
{code}

-also in hdfs_copyToLocal either check that hdfs::Options isn't empty and call 
exit with a warning or use a default one (seems reasonable).  Accessing 
optionals that haven't been set will throw some error but it's going to be 
something confusing for end users.
{code}
    //TODO: HDFS-9539 Currently options can be returned empty
    hdfs::Options options = *hdfs::getOptions();
{code}

-More generally it seems like it'd make sense to stick the config parsing, 
timeout limit changes, and FileSystem creation into a helper function in 
tools_common so things can be changed in a single place.
{code}
    //TODO: HDFS-9539 Currently options can be returned empty
    hdfs::Options options = *hdfs::getOptions();
  
    //TODO: HDFS-9539 - until then we increase the time-out to allow all 
recursive async calls to finish
    options.rpc_timeout = std::numeric_limits<int>::max();
  
    std::shared_ptr<hdfs::FileSystem> fs = hdfs::doConnect(uri.value(), 
options);
    if (!fs) {
      std::cerr << "Could not connect the file system. " << std::endl;
      exit(EXIT_FAILURE);
    }
{code}

-also in hdfs_copyToLocal: I'd increase BUF_SIZE to something a lot larger than 
4096.  We still don't have caching for DN connections so copying a large file 
is going to have a lot of overhead just setting up connections to the DN.  
Consider moving it to a global variable so it can live in bss/data rather than 
the stack; valgrind doesn't pick up on stack corruption well and it's possible 
(though unlikely) that someone will set the system's max stack size to be 
pretty small.

-would be really nice if hdfs_tail let you specify how many lines to print.  
Still helpful and a lot less work would be specifying how many bytes to print.

-IMHO nitpicking coding style stuff to death during reviews is usually an 
exercise in bikeshedding, but could you make sure tab spacing is consistent?  
For example in hdfs_chmod.cpp you switch between 2 and 4 spaces.  (turning .cpp 
to .cc extensions for consistency would be nice too)
{code}
    auto handlerFind = [fs, state](const hdfs::Status &status_find, const 
std::vector<hdfs::StatInfo> & stat_infos, bool has_more_results) -> bool {

        //For each result returned by Find we call async SetPermission with the 
handler below.
        //SetPermission DOES NOT guarantee that the handler will only be called 
once at a time, so we DO need locking in handlerSetPermission.
        auto handlerSetPermission = [state](const hdfs::Status 
&status_set_permission) {
{code}

Have you tested all of these tools using valgrind and large directory trees?  
We are in a bad state right now since the minidfscluster tests can't be run 
under valgrind (JIT compilation + GC blow things up) so memory bugs keep 
slipping in.  I'm going to file a jira to fork the process before starting the 
minidfscluster so we can start getting coverage but I'm not sure I'll get to it 
for a couple weeks.

> libhdfs++: Implement the rest of the tools
> ------------------------------------------
>
>                 Key: HDFS-10785
>                 URL: https://issues.apache.org/jira/browse/HDFS-10785
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Anatoli Shein
>            Assignee: Anatoli Shein
>         Attachments: HDFS-10785.HDFS-8707.000.patch, 
> HDFS-10785.HDFS-8707.001.patch, HDFS-10785.HDFS-8707.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to