Thanks for offering to work on this! It would be awesome to have it. I can say that we don't have that for Python ATM.
On Mon, Sep 16, 2019 at 10:56 AM Steve Niemitz <sniem...@apache.org> wrote: > Our experience has actually been that avro is more efficient than even > parquet, but that might also be skewed from our datasets. > > I might try to take a crack at this, I found > https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which > coincidentally references my thread from a couple years ago on the read > side of this :) ). > > On Mon, Sep 16, 2019 at 1:38 PM Reuven Lax <re...@google.com> wrote: > >> It's been talked about, but nobody's done anything. There as some >> difficulties related to type conversion (json and avro don't support the >> same types), but if those are overcome then an avro version would be much >> more efficient. I believe Parquet files would be even more efficient if you >> wanted to go that path, but there might be more code to write (as we >> already have some code in the codebase to convert between TableRows and >> Avro). >> >> Reuven >> >> On Mon, Sep 16, 2019 at 10:33 AM Steve Niemitz <sniem...@apache.org> >> wrote: >> >>> Has anyone investigated using avro rather than json to load data into >>> BigQuery using BigQueryIO (+ FILE_LOADS)? >>> >>> I'd be interested in enhancing it to support this, but I'm curious if >>> there's any prior work here. >>> >>