I put up a semi-WIP pull request https://github.com/apache/beam/pull/9665 for
this.  The initial results look good.  I'll spend some time soon adding
unit tests and documentation, but I'd appreciate it if someone could take a
first pass over it.

On Wed, Sep 18, 2019 at 6:14 PM Pablo Estrada <pabl...@google.com> wrote:

> Thanks for offering to work on this! It would be awesome to have it. I can
> say that we don't have that for Python ATM.
>
> On Mon, Sep 16, 2019 at 10:56 AM Steve Niemitz <sniem...@apache.org>
> wrote:
>
>> Our experience has actually been that avro is more efficient than even
>> parquet, but that might also be skewed from our datasets.
>>
>> I might try to take a crack at this, I found
>> https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which
>> coincidentally references my thread from a couple years ago on the read
>> side of this :) ).
>>
>> On Mon, Sep 16, 2019 at 1:38 PM Reuven Lax <re...@google.com> wrote:
>>
>>> It's been talked about, but nobody's done anything. There as some
>>> difficulties related to type conversion (json and avro don't support the
>>> same types), but if those are overcome then an avro version would be much
>>> more efficient. I believe Parquet files would be even more efficient if you
>>> wanted to go that path, but there might be more code to write (as we
>>> already have some code in the codebase to convert between TableRows and
>>> Avro).
>>>
>>> Reuven
>>>
>>> On Mon, Sep 16, 2019 at 10:33 AM Steve Niemitz <sniem...@apache.org>
>>> wrote:
>>>
>>>> Has anyone investigated using avro rather than json to load data into
>>>> BigQuery using BigQueryIO (+ FILE_LOADS)?
>>>>
>>>> I'd be interested in enhancing it to support this, but I'm curious if
>>>> there's any prior work here.
>>>>
>>>

Reply via email to