[ 
https://issues.apache.org/jira/browse/HDFS-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003564#comment-16003564
 ] 

Anatoli Shein commented on HDFS-10785:
--------------------------------------

Thanks [~James C] for the review. I have addressed your comments as follows:

-Looks like the tools_common getOptions functions pulls from hdfs-site.xml and 
core-site.xml. Can you also make it allow the environment's HADOOP_CONF_DIR 
override those?

* The environment's HADOOP_CONF_DIR seems to override the default values 
already (in ConfigurationLoader::SetDefaultSearchPath())

-hdfs_get and hdfs_copyToLocal are really the same thing other than the help 
message. Is there any reason to have two? hdfs_moveToLocal is also the same 
thing other than doing the delete at the end. I think it'd be straightforward 
to factor out the main copy loop into a function and call that from all 3; this 
would make things a lot more maintainable.

* hdfs_get and hdfs_copyToLocal currently do the same thing, however they will 
be different when we will implement writing functionality, since hdfs_get will 
have the ability to write to hdfs also.  I separated out the main copy loop 
into a function readFile which is being used by: hdfs_moveToLocal, hdfs_cat, 
hdfs_tail, hdfs_get, and hdfs_copyToLocal.

-Is it worth making hdfs_ls have the option to be recursive? Seems like we 
could get the same functionality with hdfs_find, can we share more code between 
the two?

* We have an option for hdfs_ls to be recursive in order to be compatible with 
the ls command of the java client 
(https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#ls).
 There are still more flags to add support to after this initial patch is 
landed. The recursive ls is implemented using a call to fs->Find, so the code 
is already being reused. I am not sure if we can further reuse the code between 
ls and find right now.

-How about moving the printStatInfo function in tools_common.h to be a .str() 
method on the StatInfo object in include/statinfo.h. That looks like it'd be 
useful for general debugging and logging in client applications as well. Same 
with printFsInfo, printContentSummary etc.

* This is done.

-in hdfs_copyToLocal you don't need the break after the call to exit. Looks 
like the same issue is in other tools. It'd be nice to generalize the options 
parsing so you don't have that block of code in all the tools, not sure if 
there's a good way to do it though.
      usage();
      exit(EXIT_SUCCESS);
      break;

* I removed this extra break in all the tools. I don't currently see a good way 
to generalize the GetOpt options parsing between the tools though.

-also in hdfs_copyToLocal either check that hdfs::Options isn't empty and call 
exit with a warning or use a default one (seems reasonable). Accessing 
optionals that haven't been set will throw some error but it's going to be 
something confusing for end users.

* Solving HDFS-9539 should solve the options problem. Currently the default 
values loaded are always empty, so the error message looks like: "Error 
connecting to the cluster: defaultFS is empty. defaultFS of [] is not supported"

-More generally it seems like it'd make sense to stick the config parsing, 
timeout limit changes, and FileSystem creation into a helper function in 
tools_common so things can be changed in a single place.

* Done: moved everything into a doConnect function.

-also in hdfs_copyToLocal: I'd increase BUF_SIZE to something a lot larger than 
4096. We still don't have caching for DN connections so copying a large file is 
going to have a lot of overhead just setting up connections to the DN. Consider 
moving it to a global variable so it can live in bss/data rather than the 
stack; valgrind doesn't pick up on stack corruption well and it's possible 
(though unlikely) that someone will set the system's max stack size to be 
pretty small.

* Done. I increased it to 1 MB and made it global.

-would be really nice if hdfs_tail let you specify how many lines to print. 
Still helpful and a lot less work would be specifying how many bytes to print.

* Java client does not have functionality to specify the number of bytes/lines 
to print 
(https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#tail).
 Not sure if this improvement should be part of this patch.

-IMHO nitpicking coding style stuff to death during reviews is usually an 
exercise in bikeshedding, but could you make sure tab spacing is consistent? 
For example in hdfs_chmod.cpp you switch between 2 and 4 spaces. (turning .cpp 
to .cc extensions for consistency would be nice too)

* I fixed this and other spacing issues that I noticed, and turned all .cpp 
extensions to .cc

Have you tested all of these tools using valgrind and large directory trees? We 
are in a bad state right now since the minidfscluster tests can't be run under 
valgrind (JIT compilation + GC blow things up) so memory bugs keep slipping in. 
I'm going to file a jira to fork the process before starting the minidfscluster 
so we can start getting coverage but I'm not sure I'll get to it for a couple 
weeks.

* Yes, I tested them with large directory trees under valgrind. The only leak I 
am getting is the 'still reachable: 72,704 bytes' which is a known bug of GCC 5.


> libhdfs++: Implement the rest of the tools
> ------------------------------------------
>
>                 Key: HDFS-10785
>                 URL: https://issues.apache.org/jira/browse/HDFS-10785
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Anatoli Shein
>            Assignee: Anatoli Shein
>         Attachments: HDFS-10785.HDFS-8707.000.patch, 
> HDFS-10785.HDFS-8707.001.patch, HDFS-10785.HDFS-8707.002.patch, 
> HDFS-10785.HDFS-8707.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to