The file splitter, block reader combination allows for parallel reading of
files by multiple partitions by dividing the files into blocks. Does anyone
have any ideas on how to have the block readers be data local to the blocks
they are reading.

I think we will need to spawn block readers on all nodes where the block
are present and if the readers are reading multiple files this could mean
all the nodes in the cluster and route the block meta information to the
appropriate block reader.

Thanks

Reply via email to