On Mon, Mar 25, 2019 at 2:03 AM Juta Staes <[email protected]> wrote:

>
> On Mon, 25 Mar 2019 at 06:15, Valentyn Tymofieiev <[email protected]>
> wrote:
>
>> We received feedback on https://issuetracker.google.com/issues/129006689 -
>> BQ developers say that schema identification is done and they discourage to
>> use schema autodetection in tables using BYTES. In light of this, I think
>> may be fair to recommend Beam users to specify BQ schemas as well when they
>> interact with BQ, and call out that writing binary data to BQ will likely
>> fail unless schema is specified. Does that make sense?
>>
>
> Given that schema autodetect does not work for bytes I think it is indeed
> a good solution to require users to specify BQ schemas as well when they
> write to BQ
>
> So new summary:
> 1. Beam will base64-encode raw bytes, before passing them to BQ over rest
> API. This will be a change in behavior for Python 2 (for good reasons).
> 2. When reading data from BQ, all fields of type BYTES will be
> base64-decoded.
> 3. Beam will send an API call to BigQuery to get table schema, whenever
> schema is not supplied, to work around
> https://issuetracker.google.com/issues/129006689. Beam will require users
> to specify the schema when writing bytes to BQ.
>

I'm not sure why we reached this conclusion. We (Beam) does not use BQ
schema auto detection feature currently.  So why not just send an API
signal to get the schema when users are writing to existing tables ? Also,
even if we decide to support schema auto detection in the future we will
not be able to support this for BYTEs type (due to the restriction by BQ).


> Thanks all for your input on this!
> Juta
>
>

Reply via email to