[ 
https://issues.apache.org/jira/browse/HDFS-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435809#comment-15435809
 ] 

Bob Hansen edited comment on HDFS-10754 at 8/24/16 10:10 PM:
-------------------------------------------------------------

Thanks for your hard work, [~anatoli.shein].  I'm sorry to have to keep 
dragging you back, but...

The new recursive methods should not use future/promises internally.  That 
blocks one of the asio threads waiting for more data; if a consumer tried to do 
one of these with a single thread in the threadpool, it would deadlock waiting 
for the subtasks to complete, but they'd all get queued up behind the initial 
handler.

Instead, whenever they're all done (request_count == 0), the last one out the 
door (the handler that dropped the request_count to 0) should call into the 
consumer's handler directly with the final status.  If any of the other threads 
has received an error, all of the subsequent deliveries to the handler should 
return false, telling find "I'm going to report an error anyway, so don't 
bother recursing any more."  It's good to wait until request_count==0, even in 
an error state, so the consumer doesn't have any lame duck requests queued up 
to take care of?

Also because all of this is asynchronous, you can't allocate the lock and state 
variables on the stack.  When the consumer calls SetOwner('/', true, handler), 
the function is going to return as soon as the find operation is kicked off, 
destroying all of the elements on the stack.  We'll need to create a little 
struct for SetOwner that is maintained with a shared_ptr and cleaned up when 
the last request is done.



Minor points:
In find, perhaps recursion_counter is a bit of a misnomer at this point.  It's 
more outstanding_requests, since for big directories, we'll have more requests 
without recursing.

Perhaps FindOperationState is a better name than CurrentState, and 
SharedFindState is better than just SharedState (since we might have many 
shared states in the FileSystem class).

In CurrentState, perhaps "depth" is more accurate than position?

Do we support a globbing find without recursion?  Can I find "/dir?/path*/" 
"\*.db", and not have it recurse to the sub-directories of path*?

Can we push the shims and state into the .cpp file and keep them out of the 
itnerface (even if private)?

We're very close, now.



was (Author: bobhansen):
Thanks for your hard work, [~anatoli.shein].  I'm sorry to have to keep 
dragging you back, but...

The new recursive methods should not use future/promises internally.  That 
blocks one of the asio threads waiting for more data; if a consumer tried to do 
one of these with a single thread in the threadpool, it would deadlock waiting 
for the subtasks to complete, but they'd all get queued up behind the initial 
handler.

Instead, whenever they're all done (request_count == 0), the last one out the 
door (the handler that dropped the request_count to 0) should call into the 
consumer's handler directly with the final status.  If any of the other threads 
has received an error, all of the subsequent deliveries to the handler should 
return false, telling find "I'm going to report an error anyway, so don't 
bother recursing any more."  It's good to wait until request_count==0, even in 
an error state, so the consumer doesn't have any lame duck requests queued up 
to take care of?

Also because all of this is asynchronous, you can't allocate the lock and state 
variables on the stack.  When the consumer calls SetOwner('/', true, handler), 
the function is going to return as soon as the find operation is kicked off, 
destroying all of the elements on the stack.  We'll need to create a little 
struct for SetOwner that is maintained with a shared_ptr and cleaned up when 
the last request is done.



Minor points:
In find, perhaps recursion_counter is a bit of a misnomer at this point.  It's 
more outstanding_requests, since for big directories, we'll have more requests 
without recursing.

Perhaps FindOperationState is a better name than CurrentState, and 
SharedFindState is better than just SharedState (since we might have many 
shared states in the FileSystem class).

In CurrentState, perhaps "depth" is more accurate than position?

Do we support a globbing find without recursion?  Can I find "/dir?/path*/" 
"*.db", and not have it recurse to the sub-directories of path*?

Can we push the shims and state into the .cpp file and keep them out of the 
itnerface (even if private)?

We're very close, now.


> libhdfs++: Create tools directory and implement hdfs_cat, hdfs_chgrp, 
> hdfs_chown, hdfs_chmod and hdfs_find.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10754
>                 URL: https://issues.apache.org/jira/browse/HDFS-10754
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Anatoli Shein
>            Assignee: Anatoli Shein
>         Attachments: HDFS-10754.HDFS-8707.000.patch, 
> HDFS-10754.HDFS-8707.001.patch, HDFS-10754.HDFS-8707.002.patch, 
> HDFS-10754.HDFS-8707.003.patch, HDFS-10754.HDFS-8707.004.patch, 
> HDFS-10754.HDFS-8707.005.patch, HDFS-10754.HDFS-8707.006.patch, 
> HDFS-10754.HDFS-8707.007.patch, HDFS-10754.HDFS-8707.008.patch, 
> HDFS-10754.HDFS-8707.009.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to