Exposing Spark parallelized directory listing & non-locality listing in core

Holden Karau Tue, 21 Jul 2020 16:59:25 -0700

Hi Folks,

In Spark SQL there is the ability to have Spark do it's partition
discovery/file listing in parallel on the worker nodes and also avoid
locality lookups. I'd like to expose this in core, but given the Hadoop
APIs it's a bit more complicated to do right. I made a quick POC and two
potential different paths we could do for implementation and wanted to see
if anyone had thoughts - https://github.com/apache/spark/pull/29179.


Cheers,

Holden

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Exposing Spark parallelized directory listing & non-locality listing in core

Reply via email to