It appears support for this type of control over block placement is going out in the next version of HDFS: https://issues.apache.org/jira/browse/HDFS-2576
On Tue, Aug 26, 2014 at 7:43 AM, Gary Malouf <malouf.g...@gmail.com> wrote: > One of my colleagues has been questioning me as to why Spark/HDFS makes no > attempts to try to co-locate related data blocks. He pointed to this > paper: http://www.vldb.org/pvldb/vol4/p575-eltabakh.pdf from 2011 on the > CoHadoop research and the performance improvements it yielded for > Map/Reduce jobs. > > Would leveraging these ideas for writing data from Spark make sense/be > worthwhile? > > >