On Mon, Mar 25, 2019 at 2:16 PM Pablo Estrada <[email protected]> wrote:

> +Chamikara Jayalath <[email protected]> with the new BigQuery sink,
> schema autodetection is supported (it's a very simple thing to have). Do
> you think we should not have it?
> Best
> -P.
>

Ah good to know. But IMO users should be able to write to existing tables
without specifying a schema (when CEATE_DISPOSITION is CREATE_NEVER for
example). How do users enable schema auto-detection ? Probably this should
not be enabled by default and we should clearly advertise that bytes type
is not supported (or support it with extra information). Just my 2 cents.

Thanks,
Cham


>
> On Mon, Mar 25, 2019 at 11:01 AM Chamikara Jayalath <[email protected]>
> wrote:
>
>>
>>
>> On Mon, Mar 25, 2019 at 2:03 AM Juta Staes <[email protected]> wrote:
>>
>>>
>>> On Mon, 25 Mar 2019 at 06:15, Valentyn Tymofieiev <[email protected]>
>>> wrote:
>>>
>>>> We received feedback on
>>>> https://issuetracker.google.com/issues/129006689 - BQ developers say
>>>> that schema identification is done and they discourage to use schema
>>>> autodetection in tables using BYTES. In light of this, I think may be fair
>>>> to recommend Beam users to specify BQ schemas as well when they interact
>>>> with BQ, and call out that writing binary data to BQ will likely fail
>>>> unless schema is specified. Does that make sense?
>>>>
>>>
>>> Given that schema autodetect does not work for bytes I think it is
>>> indeed a good solution to require users to specify BQ schemas as well when
>>> they write to BQ
>>>
>>> So new summary:
>>> 1. Beam will base64-encode raw bytes, before passing them to BQ over
>>> rest API. This will be a change in behavior for Python 2 (for good reasons).
>>> 2. When reading data from BQ, all fields of type BYTES will be
>>> base64-decoded.
>>> 3. Beam will send an API call to BigQuery to get table schema, whenever
>>> schema is not supplied, to work around
>>> https://issuetracker.google.com/issues/129006689. Beam will require
>>> users to specify the schema when writing bytes to BQ.
>>>
>>
>> I'm not sure why we reached this conclusion. We (Beam) does not use BQ
>> schema auto detection feature currently.  So why not just send an API
>> signal to get the schema when users are writing to existing tables ? Also,
>> even if we decide to support schema auto detection in the future we will
>> not be able to support this for BYTEs type (due to the restriction by BQ).
>>
>>
>>> Thanks all for your input on this!
>>> Juta
>>>
>>>

Reply via email to