Re: using avro instead of json for BigQueryIO.Write

Pablo Estrada Wed, 18 Sep 2019 15:15:23 -0700

Thanks for offering to work on this! It would be awesome to have it. I can
say that we don't have that for Python ATM.


On Mon, Sep 16, 2019 at 10:56 AM Steve Niemitz <[email protected]> wrote:

> Our experience has actually been that avro is more efficient than even
> parquet, but that might also be skewed from our datasets.
>
> I might try to take a crack at this, I found
> https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which
> coincidentally references my thread from a couple years ago on the read
> side of this :) ).
>
> On Mon, Sep 16, 2019 at 1:38 PM Reuven Lax <[email protected]> wrote:
>
>> It's been talked about, but nobody's done anything. There as some
>> difficulties related to type conversion (json and avro don't support the
>> same types), but if those are overcome then an avro version would be much
>> more efficient. I believe Parquet files would be even more efficient if you
>> wanted to go that path, but there might be more code to write (as we
>> already have some code in the codebase to convert between TableRows and
>> Avro).
>>
>> Reuven
>>
>> On Mon, Sep 16, 2019 at 10:33 AM Steve Niemitz <[email protected]>
>> wrote:
>>
>>> Has anyone investigated using avro rather than json to load data into
>>> BigQuery using BigQueryIO (+ FILE_LOADS)?
>>>
>>> I'd be interested in enhancing it to support this, but I'm curious if
>>> there's any prior work here.
>>>
>>

Re: using avro instead of json for BigQueryIO.Write

Reply via email to