Re: Preferred locations (or data locality) for batch pipelines.

Amit Sela Thu, 22 Sep 2016 05:58:13 -0700

Wouldn't it force all runners to implement this for all distributed
filesystems ? It's true that each runner has it's own "partitioning"
mechanism, but I assume (maybe I'm wrong) that open-source runners use the
Hadoop InputFormat/InputSplit for that.. and the proper connectors for that
to run on top of s3/gs.


If this is wrong, each runner should take care of it's own, but if not, we
could have a generic solution for runners, no ?

Thanks,
Amit

On Thu, Sep 22, 2016 at 3:30 PM Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi Amit,
>
> as the purpose is to remove IOChannelFactory, then I would suggest it's
> a runner concern. The Read.Bounded should "locate" the bundles on a
> executor close to the read data (even if it's not always possible
> depending of the source).
>
> My $0.01
>
> Regards
> JB
>
> On 09/22/2016 02:26 PM, Amit Sela wrote:
> > It's not new that batch pipeline can optimize on data locality, my
> question
> > is regarding this responsibility in Beam.
> > If runners should implement a generic Read.Bounded support, should they
> > also implement locating the input blocks ? or should it be a part
> > of IOChannelFactory implementations ? or another way to go at it that I'm
> > missing ?
> >
> > Thanks,
> > Amit.
> >
>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: Preferred locations (or data locality) for batch pipelines.

Reply via email to