Hi, Min. In DefaultTaskScheduler, each container is mapped to each disk of all nodes in a cluster. When a container requests a task, DefaultTaskScheduler selects a closest task and assigns it to the container. This process works for only the local reads. The disk volume information is not considered for remote reads.
In my opinion, this is enough for us because there are few remote tasks in each sub query. From a test on an in-house cluster composed of 32 nodes, the ratio of remote tasks to whole tasks was only about 0.17% (The query was 'select l_orderkey from lineitem', and the volume of the lineitem table was about 1TB.). Since the number of tasks was very small, there were small disk contentions. Hope that answers your questions. Thanks, Jihoon 2014-02-13 11:00 GMT+09:00 Min Zhou <[email protected]>: > Hi all, > > Tajo leverages the feature supported by HDFS-3672, which exposes the disk > volume id of each hdfs data block. I already found the related code in > DefaultTaskScheduler.assignToLeafTasks, can anyone explain the logic for > me? What the scheduler do when the hdfs read is a remote read on the > other > machine's disk? > > > Thanks, > Min > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > My profile: > http://www.linkedin.com/in/coderplay > My blog: > http://coderplay.javaeye.com >
