[
https://issues.apache.org/jira/browse/HDFS-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anatoli Shein updated HDFS-10785:
---------------------------------
Attachment: HDFS-10785.HDFS-8707.005.patch
Thank you for the review [~James C]. I have addressed it in my new patch
(attached) as follows:
1) FsInfo::str takes a URI object. Can we have the str method just format the
contents of the FsInfo struct? If a tool wants to prefix those contents with a
URI it can do it in a more application specific way elsewhere. The main issue
here is that because you include common/uri.h you're pulling implementation
details into the public headers, so those headers alone wouldn't be enough to
link against the library.
(/) I removed the URI object from being passed to the str() method of FsInfo,
however it still takes string fs_name in order to output the FsInfo object
using the correct df format. Maybe we should consider adding another str method
separately to output just the struct?
2) In the headers for FsInfo, StatInfo, and ContentSummary you include iomanip
and sstream. You can push these includes down to the implementation since they
only need std::string. content_summary.h also includes stdint.h, but c++11
should have all the (u)int<width>_t types built in, was there some other reason
you needed to include this? Maybe older compiler with partial c++11 support?
(/) Done. I moved iomanip and sstream includes to the implementation files, and
removed the stdint.h include.
3) in hdfs_ls your handler takes an std::promise by reference. This gets into
race condition issues described in HDFS-11556 which you might be less likely to
hit if you're using a single worker thread. For now just wrap the promise in a
shared_ptr and capture the shared_ptr by copy.
That issue is amplified here (from same block of code):
{code:language=c++}
if (!has_more_results) {
promise.set_value(); //set promise
return false; //request stop sending results
}
return true; //request more results
{code}
If you have something blocked on that promise it's possible that it unblocks
right away and you have a race for assigning values. It looks like you can get
away with it here but I'd really try to avoid that by having the lambda take a
bool&. Tracing if it's possible to hit that race condition for every usage is
very time consuming and easy to get wrong.
(i) Actually, it this situation I don't think the code has a race condition. We
first pass the promise to the handler by reference, and block until it is set.
The function that will use this handler (either GetListing or Find) guarantees
that the handler will only be called once at a time (there is locking in place
for that). The function also guarantees that the handler will be called only
once with the exit status set (meaning has_more_results set to false),
therefore it is impossible for the promise to be set twice or referenced when
it was deallocated from stack. Therefore I think this code is safe, and adding
a shared pointer would add unnecessary overhead. I did also test it with
multiple threads and never got an issue.
Other:
This is just personal preference so I'm not going to push 1 way or the other,
but IMHO sprintf can be a lot more concise than iostream+iomanip for places
you're doing fixed width output. If you prefer iomanip that's cool too.
(i) I chose iostream+iomanip because it is easily extensible and is typesafe,
however it does add more code, so we can convert to sprintf later if needed.
Some of the handlers for the recursive async calls may be better off as structs
with an overloaded operator() (lambdas are just shorthand for that anyway).
Lambdas are good for tiny chunks of code but can become confusing once there's
a mix of local variables along with stuff being captured particularly when you
have them inside main(). Doesn't need to be fixed in this patch since the other
tools do it but if you can think of a nice way to encapsulate that
functionality (at least pull if from main) it'd make it easier to new
contributors (and reviewers) to understand and debug.
(i) Yes, we will have to think of a good way to do this.
The recursive functions like chmod/chgrp/chown/du could be simplified a lot by
generalizing FileSystem::Find into a file tree walk
(http://man7.org/linux/man-pages/man3/ftw.3.html) and just plugging in a lambda
that did the modification to change permissions or whatever else. Find does
most of this work already but it seems like things that use it have a lot of
glue code in common.
(i) We can try and do this, but it would significantly increase the complexity
of Find, since it allows wild cards in both path and name, while other methods
just recurse once.
Another thing that's not specific to your patch but might be worth considering
in the future is when you have long function or method signatures it's hard to
tell what bool flags are doing; especially once that are just at the end to
toggle something like printing extra info. A 2 value enum e.g. {INCLUDE_QUOTA,
OMIT_QUOTA} in the case of ContentSummary::str(<flag_type>) makes this more
clear at the call site and machine code the compiler emits will generally look
the same.
(i) Normally I think it makes sense to use enums when we have more than two
possible values of the variable in order to keep things simple, but we can
always change this. For the time being I just renamed the boolean 'quota' to
'include_quota' so that it is more intuitive.
> libhdfs++: Implement the rest of the tools
> ------------------------------------------
>
> Key: HDFS-10785
> URL: https://issues.apache.org/jira/browse/HDFS-10785
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: Anatoli Shein
> Assignee: Anatoli Shein
> Attachments: HDFS-10785.HDFS-8707.000.patch,
> HDFS-10785.HDFS-8707.001.patch, HDFS-10785.HDFS-8707.002.patch,
> HDFS-10785.HDFS-8707.003.patch, HDFS-10785.HDFS-8707.004.patch,
> HDFS-10785.HDFS-8707.005.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]