Has anyone looked into implementing this for the Python SDK? It would
be nice to have it if only for the ability to write float values with
NaN and infinity values. I didn't see anything in Jira, happy to
create a ticket, but wanted to ask around first.

On Thu, Oct 17, 2019 at 12:53 PM Reuven Lax <re...@google.com> wrote:
>
> I'll take a look as well. Thanks for doing this!
>
> On Fri, Oct 4, 2019 at 9:16 PM Pablo Estrada <pabl...@google.com> wrote:
>>
>> Thanks Steve!
>> I'll take a look next week. Sorry about the delay so far.
>> Best
>> -P.
>>
>> On Fri, Sep 27, 2019 at 10:37 AM Steve Niemitz <sniem...@apache.org> wrote:
>>>
>>> I put up a semi-WIP pull request https://github.com/apache/beam/pull/9665 
>>> for this.  The initial results look good.  I'll spend some time soon adding 
>>> unit tests and documentation, but I'd appreciate it if someone could take a 
>>> first pass over it.
>>>
>>> On Wed, Sep 18, 2019 at 6:14 PM Pablo Estrada <pabl...@google.com> wrote:
>>>>
>>>> Thanks for offering to work on this! It would be awesome to have it. I can 
>>>> say that we don't have that for Python ATM.
>>>>
>>>> On Mon, Sep 16, 2019 at 10:56 AM Steve Niemitz <sniem...@apache.org> wrote:
>>>>>
>>>>> Our experience has actually been that avro is more efficient than even 
>>>>> parquet, but that might also be skewed from our datasets.
>>>>>
>>>>> I might try to take a crack at this, I found 
>>>>> https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which 
>>>>> coincidentally references my thread from a couple years ago on the read 
>>>>> side of this :) ).
>>>>>
>>>>> On Mon, Sep 16, 2019 at 1:38 PM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>> It's been talked about, but nobody's done anything. There as some 
>>>>>> difficulties related to type conversion (json and avro don't support the 
>>>>>> same types), but if those are overcome then an avro version would be 
>>>>>> much more efficient. I believe Parquet files would be even more 
>>>>>> efficient if you wanted to go that path, but there might be more code to 
>>>>>> write (as we already have some code in the codebase to convert between 
>>>>>> TableRows and Avro).
>>>>>>
>>>>>> Reuven
>>>>>>
>>>>>> On Mon, Sep 16, 2019 at 10:33 AM Steve Niemitz <sniem...@apache.org> 
>>>>>> wrote:
>>>>>>>
>>>>>>> Has anyone investigated using avro rather than json to load data into 
>>>>>>> BigQuery using BigQueryIO (+ FILE_LOADS)?
>>>>>>>
>>>>>>> I'd be interested in enhancing it to support this, but I'm curious if 
>>>>>>> there's any prior work here.

-- 


*Confidentiality Note:* We care about protecting our proprietary 
information, confidential material, and trade secrets. This message may 
contain some or all of those things. Cruise will suffer material harm if 
anyone other than the intended recipient disseminates or takes any action 
based on this message. If you have received this message (including any 
attachments) in error, please delete it immediately and notify the sender 
promptly.

Reply via email to