I'll take a look as well. Thanks for doing this! On Fri, Oct 4, 2019 at 9:16 PM Pablo Estrada <pabl...@google.com> wrote:
> Thanks Steve! > I'll take a look next week. Sorry about the delay so far. > Best > -P. > > On Fri, Sep 27, 2019 at 10:37 AM Steve Niemitz <sniem...@apache.org> > wrote: > >> I put up a semi-WIP pull request https://github.com/apache/beam/pull/9665 for >> this. The initial results look good. I'll spend some time soon adding >> unit tests and documentation, but I'd appreciate it if someone could take a >> first pass over it. >> >> On Wed, Sep 18, 2019 at 6:14 PM Pablo Estrada <pabl...@google.com> wrote: >> >>> Thanks for offering to work on this! It would be awesome to have it. I >>> can say that we don't have that for Python ATM. >>> >>> On Mon, Sep 16, 2019 at 10:56 AM Steve Niemitz <sniem...@apache.org> >>> wrote: >>> >>>> Our experience has actually been that avro is more efficient than even >>>> parquet, but that might also be skewed from our datasets. >>>> >>>> I might try to take a crack at this, I found >>>> https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which >>>> coincidentally references my thread from a couple years ago on the read >>>> side of this :) ). >>>> >>>> On Mon, Sep 16, 2019 at 1:38 PM Reuven Lax <re...@google.com> wrote: >>>> >>>>> It's been talked about, but nobody's done anything. There as some >>>>> difficulties related to type conversion (json and avro don't support the >>>>> same types), but if those are overcome then an avro version would be much >>>>> more efficient. I believe Parquet files would be even more efficient if >>>>> you >>>>> wanted to go that path, but there might be more code to write (as we >>>>> already have some code in the codebase to convert between TableRows and >>>>> Avro). >>>>> >>>>> Reuven >>>>> >>>>> On Mon, Sep 16, 2019 at 10:33 AM Steve Niemitz <sniem...@apache.org> >>>>> wrote: >>>>> >>>>>> Has anyone investigated using avro rather than json to load data into >>>>>> BigQuery using BigQueryIO (+ FILE_LOADS)? >>>>>> >>>>>> I'd be interested in enhancing it to support this, but I'm curious if >>>>>> there's any prior work here. >>>>>> >>>>>