okumin opened a new pull request #79:
URL: https://github.com/apache/tez/pull/79


   https://issues.apache.org/jira/browse/TEZ-4246
   
   In case that there are just two disks, the current implementation is likely 
to use one of them to write spill data and the other one to store the index 
files. All `file.out`, bigger than `file.out.index`, are written on the same 
disk.
   
   1. write spill data on `/data/0/..../file.out`
   2. write a spill index file on the other directory, 
`/data/1/.../file.out.index`
   3. write spill data on `/data/0/..../file.out`
   4. ...
   
   This PR would change the behavior so as to utilize both disks more 
proportionally.
   
   1. write spill data on `/data0/..../file.out`
   2. write the spill index file on the same directory, `/data/0/.../file.out
   3. write spill data on `/data1/..../file.out`
   4. ...
   
   Index files are relatively small and I think it's reasonable to put it on 
the same directory as `file.out`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to