I've only ever seen that being used to figure out which file the runner/mapper/operation is working on. Otherwise, I haven't seen those operations care where in the file they're working.
On Thu, Sep 22, 2016 at 5:57 AM Amit Sela <amitsel...@gmail.com> wrote: > Wouldn't it force all runners to implement this for all distributed > filesystems ? It's true that each runner has it's own "partitioning" > mechanism, but I assume (maybe I'm wrong) that open-source runners use the > Hadoop InputFormat/InputSplit for that.. and the proper connectors for that > to run on top of s3/gs. > > If this is wrong, each runner should take care of it's own, but if not, we > could have a generic solution for runners, no ? > > Thanks, > Amit > > On Thu, Sep 22, 2016 at 3:30 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Hi Amit, > > > > as the purpose is to remove IOChannelFactory, then I would suggest it's > > a runner concern. The Read.Bounded should "locate" the bundles on a > > executor close to the read data (even if it's not always possible > > depending of the source). > > > > My $0.01 > > > > Regards > > JB > > > > On 09/22/2016 02:26 PM, Amit Sela wrote: > > > It's not new that batch pipeline can optimize on data locality, my > > question > > > is regarding this responsibility in Beam. > > > If runners should implement a generic Read.Bounded support, should they > > > also implement locating the input blocks ? or should it be a part > > > of IOChannelFactory implementations ? or another way to go at it that > I'm > > > missing ? > > > > > > Thanks, > > > Amit. > > > > > > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >