Steve Loughran commented on HADOOP-13371:

thinking of closing PR 204 and tagging this as a wontfix, at least for now. 
It's not that I'm a fan of the current globber code, its that 

# with HADOOP-13345 in auth mode dynamoDB offers the  high performance 
recursive treewalk the current globber needs
# and the consistency which FS client code demands

Given that #2 is an absolute requirement for safe use of S3 as a source of data 
in any workflow which chains together queries, it's hard to justify going near 

sorry, thanks for your contribution here, but, like mine, it's safest to put 

We do have lots of other outstanding S3 tasks (HADOOP-14831, HADOOP-15220) 
which need care and attention

> S3A globber to use bulk listObject call over recursive directory scan
> ---------------------------------------------------------------------
>                 Key: HADOOP-13371
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13371
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs, fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
> HADOOP-13208 produces O(1) listing of directory trees in 
> {{FileSystem.listStatus}} calls, but doesn't do anything for 
> {{FileSystem.globStatus()}}, which uses a completely different codepath, one 
> which does a selective recursive scan by pattern matching as it goes down, 
> filtering out those patterns which don't match. Cost is 
> O(matching-directories) + cost of examining the files.
> It should be possible to do the glob status listing in S3A not through the 
> filtered treewalk, but through a list + filter operation. This would be an 
> O(files) lookup *before any filtering took place*.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to