[
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437416#comment-15437416
]
Anatoli Shein edited comment on HDFS-10754 at 8/25/16 6:33 PM:
---------------------------------------------------------------
Thank for the review, [~bobhansen].
I have addressed your comments as follows:
* The new recursive methods should not use future/promises internally. That
blocks one of the asio threads waiting for more data; if a consumer tried to do
one of these with a single thread in the threadpool, it would deadlock waiting
for the subtasks to complete, but they'd all get queued up behind the initial
handler.
* Instead, whenever they're all done (request_count == 0), the last one out the
door (the handler that dropped the request_count to 0) should call into the
consumer's handler directly with the final status. If any of the other threads
has received an error, all of the subsequent deliveries to the handler should
return false, telling find "I'm going to report an error anyway, so don't
bother recursing any more." It's good to wait until request_count==0, even in
an error state, so the consumer doesn't have any lame duck requests queued up
to take care of?
* Also because all of this is asynchronous, you can't allocate the lock and
state variables on the stack. When the consumer calls SetOwner('/', true,
handler), the function is going to return as soon as the find operation is
kicked off, destroying all of the elements on the stack. We'll need to create a
little struct for SetOwner that is maintained with a shared_ptr and cleaned up
when the last request is done.
(/) I removed futures and promises from recursive methods, and created a small
struct to keep the state. Now it is purely async.
Minor points:
* In find, perhaps recursion_counter is a bit of a misnomer at this point. It's
more outstanding_requests, since for big directories, we'll have more requests
without recursing.
(/) Done.
* Perhaps FindOperationState is a better name than CurrentState, and
SharedFindState is better than just SharedState (since we might have many
shared states in the FileSystem class).
(/) Done.
* In CurrentState, perhaps "depth" is more accurate than position?
(/) Yes, I changed it to "depth" now.
* Do we support a globbing find without recursion? Can I find "/dir?/path*/"
"*.db", and not have it recurse to the sub-directories of path*?
(/) POSIX find supports globbing without recursion by setting maxdepth to zero.
I added maxdepth functionality for our tool also.
* Can we push the shims and state into the .cpp file and keep them out of the
itnerface (even if private)?
(/) Done.
was (Author: anatoli.shein):
Thank for the review, [~bobhansen].
I have addressed your comments as follows:
* The new recursive methods should not use future/promises internally. That
blocks one of the asio threads waiting for more data; if a consumer tried to do
one of these with a single thread in the threadpool, it would deadlock waiting
for the subtasks to complete, but they'd all get queued up behind the initial
handler.
Instead, whenever they're all done (request_count == 0), the last one out the
door (the handler that dropped the request_count to 0) should call into the
consumer's handler directly with the final status. If any of the other threads
has received an error, all of the subsequent deliveries to the handler should
return false, telling find "I'm going to report an error anyway, so don't
bother recursing any more." It's good to wait until request_count==0, even in
an error state, so the consumer doesn't have any lame duck requests queued up
to take care of?
Also because all of this is asynchronous, you can't allocate the lock and state
variables on the stack. When the consumer calls SetOwner('/', true, handler),
the function is going to return as soon as the find operation is kicked off,
destroying all of the elements on the stack. We'll need to create a little
struct for SetOwner that is maintained with a shared_ptr and cleaned up when
the last request is done.
(/) I removed futures and promises from recursive methods, and created a small
struct to keep the state. Now it is purely async.
Minor points:
* In find, perhaps recursion_counter is a bit of a misnomer at this point. It's
more outstanding_requests, since for big directories, we'll have more requests
without recursing.
(/) Done.
* Perhaps FindOperationState is a better name than CurrentState, and
SharedFindState is better than just SharedState (since we might have many
shared states in the FileSystem class).
(/) Done.
* In CurrentState, perhaps "depth" is more accurate than position?
(/) Yes, I changed it to "depth" now.
* Do we support a globbing find without recursion? Can I find "/dir?/path*/"
"*.db", and not have it recurse to the sub-directories of path*?
(/) POSIX find supports globbing without recursion by setting maxdepth to zero.
I added maxdepth functionality for our tool also.
* Can we push the shims and state into the .cpp file and keep them out of the
itnerface (even if private)?
(/) Done.
> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp,
> hdfs_chown, hdfs_chmod and hdfs_find.
> -----------------------------------------------------------------------------------------------------------
>
> Key: HDFS-10754
> URL: https://issues.apache.org/jira/browse/HDFS-10754
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Reporter: Anatoli Shein
> Assignee: Anatoli Shein
> Attachments: HDFS-10754.HDFS-8707.000.patch,
> HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch,
> HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch,
> HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch,
> HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch,
> HDFS-10754.HDFS-8707.009.patch, HDFS-10754.HDFS-8707.010.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]