Hi, I have been familiar myself with the drill codebase. But would need some advice on how drillbits are chosen for execution. Given a specific case, I'd like to scan a file on Hdfs, and the result will be merged to get a simple sum from a integer column. As you can see, the file is already spread across the cluster. Two points could be noted: 1.Some affinity might be calculated (statically) wrt block placement if the drillbits are running at same nodes as hdfs data nodes. 2. When partial result s are ready from all drillbit, we need to transfer some of them to one single drillbit, we need some parameters(dynamic) as input. The process could become more complicated if the intermediate results are merged in stages.
I can't find the place where decisions for above cases are made, so any pointer in the source or document would help! Thanks! Jaguar 3.
