bilaharith commented on pull request #2548: URL: https://github.com/apache/hadoop/pull/2548#issuecomment-760379281
> I fear you are the same mistake we did in the S3A codebase: giving all the helper classes a reference back to the ABFS Store class, so making them too intermingled. It is unsustainable. See: https://github.com/steveloughran/engineering-proposals/blob/trunk/refactoring-s3a.md > > Proposed > > * Use a specific listing callback which the store can directly/indirectly implement: org.apache.hadoop.fs.s3a.impl.ListingOperationCallbacks. > * This will also help you write unit tests against a stub implementation without having to use reflection and manipulating access modifiers in a way that is over-complex and brittle. > * Use IOStatistics API to track duration of calls > * and serve this up. It is really interesting to know how long lists take > * add a close() operator to cancel queue & wait for completion. > > the other incremental listX calls (listFileStatus, listLocatedStatus) are all similar and should be done in the same uber-JIRA. listLocatedStatus is used in LocatedFileStatusFetcher, which is used during MR and spark file scanning. listFileStatus should be used more for deep tree scans. > > I'd also prefer if you use a shared thread pool of that ABFS store instance > > * stops many, many list iterators overloading things > * may offer faster startup > * custom thread names help identify origin of less-helpful stack traces > * could have a gauge on the instance to measure pool load > * filesystem.close() can interrupt the pool to shut things down. > > Have a look @ org.apache.hadoop.fs.s3a.Listing in trunk to see what's been done there. There's a lot you don't need (S3Guard reconciliation), and it currently only prefetches the next page of results. But it does include the stats collection, a restricted callback to the FS, use of org.apache.hadoop.util.functional.RemoteIterators for functional-programming style use of RemoteIterator classes. Take a look at it's {{org.apache.hadoop.util.functional.RemoteIterators#foreach}} method too. > > I know, I'm giving extra homework. But I'll only keep asking for these, so better to start now. I will raise the JIRA as suggested. For now keeping the PR as a draft. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
