Gary, do you mean Spark and HDFS separately, or Spark's use of HDFS?

If the former, Spark does support copartitioning.

If the latter, it's an HDFS scope that's outside of Spark. On that note,
Hadoop does also make attempts to collocate data, e.g., rack awareness. I'm
sure the paper makes useful contributions for its set of use cases.

Sent while mobile. Pls excuse typos etc.
On Aug 26, 2014 5:21 AM, "Gary Malouf" <malouf.g...@gmail.com> wrote:

> It appears support for this type of control over block placement is going
> out in the next version of HDFS:
> https://issues.apache.org/jira/browse/HDFS-2576
>
>
> On Tue, Aug 26, 2014 at 7:43 AM, Gary Malouf <malouf.g...@gmail.com>
> wrote:
>
> > One of my colleagues has been questioning me as to why Spark/HDFS makes
> no
> > attempts to try to co-locate related data blocks.  He pointed to this
> > paper: http://www.vldb.org/pvldb/vol4/p575-eltabakh.pdf from 2011 on the
> > CoHadoop research and the performance improvements it yielded for
> > Map/Reduce jobs.
> >
> > Would leveraging these ideas for writing data from Spark make sense/be
> > worthwhile?
> >
> >
> >
>

Reply via email to