+1 for generalizing header passing logic of TextIO to support other formats
such as VCF and CSV. I think it'll still be useful to have VcfIO though
that reads header+lines for a bundle and produces a PCollection of VCF
record protos.

Thanks,
Cham

On Thu, Aug 17, 2017 at 11:23 AM Eugene Kirpichov
<[email protected]> wrote:

> I really like the idea of generalizing TextIO to be able to read a file
> header while still reading the rest of the contents in parallel. People
> have long been asking for this for CSV. If we add that, a special VcfIO
> will not be necessary because you'll be able to just use the enhanced
> TextIO and parse VCF from the lines and headers (granted, it still makes
> sense to have this as a library, just not necessarily packaged as a
> PTransform).
>
> On Thu, Aug 17, 2017, 10:35 AM Reuven Lax <[email protected]>
> wrote:
>
> > I think this approach should not be that hard. We need to see if some of
> > the code in TextSource needs to be refactored, as TextSource is currently
> > package private.
> >
> > On Wed, Aug 16, 2017 at 12:04 PM, Chamikara Jayalath <
> [email protected]
> > >
> > wrote:
> >
> > > Thanks for proposing this.
> > >
> > > I left some comments. My main concern is the possible complexity this
> > might
> > > add to textio and potential performance impact. So at this point I
> prefer
> > > if this is implemented as a new filebasedsource instead of updating
> > textio.
> > > I'm open to being convinced otherwise :).
> > >
> > > Thanks,
> > > Cham
> > >
> > > On Wed, Aug 16, 2017 at 11:01 AM Eugene Kirpichov
> > > <[email protected]> wrote:
> > >
> > > > +Chamikara Jayalath <[email protected]>
> > > > Also you may find useful the recent discussion on WholeFileIO
> > > >
> > > > https://lists.apache.org/thread.html/6ea193b7178f8ab44de5562bfdd94d
> > > c3fe740bc440e8a05e533e40cf@%3Cdev.beam.apache.org%3E
> > > > https://github.com/apache/beam/pull/3543 (I think bulk of discussion
> > > > happened there)
> > > > https://github.com/apache/beam/pull/3717
> > > >
> > > >
> > > > On Wed, Aug 16, 2017 at 10:58 AM Jean-Baptiste Onofré <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > I will thanks !
> > > > >
> > > > > Regards
> > > > > JB
> > > > >
> > > > > On Aug 16, 2017, 18:53, at 18:53, Asha Rostamianfar
> > > > > <[email protected]> wrote:
> > > > > >Hi everyone,
> > > > > >
> > > > > >I have a proposal to add a new built-in I/O source for VCF files:
> > > > > >
> > > > >
> > > > https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyR
> > > SJrcGbEDpY9Lkw/edit
> > > > > >
> > > > > >I'm planning to take on the implementation work myself, but wanted
> > to
> > > > > >get
> > > > > >preliminary feedback about the proposed design as it requires
> making
> > > > > >changes to the existing TextIO. I will file a JIRA FR as well.
> > > > > >
> > > > > >Please take a look at the doc and feel free to comment.
> > > > > >
> > > > > >Thanks,
> > > > > >Asha
> > > > >
> > > >
> > >
> >
>

Reply via email to