Data locality with regard to hdfs

熊贻青 Sat, 08 Mar 2014 21:41:24 -0800

Hi,
I have been familiar myself with the drill codebase. But would need some
advice on how drillbits are chosen for execution. Given a specific case,
I'd like to scan a file on Hdfs, and the result will be merged to get a
simple sum from a integer column. As you can see, the file is already
spread across the cluster. Two points could be noted:
1.Some affinity might be calculated (statically) wrt block placement if the
drillbits are running at same nodes as hdfs data nodes.
2. When partial result s are ready from all drillbit, we need to transfer
some of them to one single drillbit, we need some parameters(dynamic) as
input. The process could become more complicated if the intermediate
results are merged in stages.


I can't find the place where decisions for above cases are made, so any
pointer in the source or document would help!

Thanks!
Jaguar
3.

Data locality with regard to hdfs

Reply via email to