On Mon, 25 Mar 2019 at 06:15, Valentyn Tymofieiev <valen...@google.com>
wrote:

> We received feedback on https://issuetracker.google.com/issues/129006689 -
> BQ developers say that schema identification is done and they discourage to
> use schema autodetection in tables using BYTES. In light of this, I think
> may be fair to recommend Beam users to specify BQ schemas as well when they
> interact with BQ, and call out that writing binary data to BQ will likely
> fail unless schema is specified. Does that make sense?
>

Given that schema autodetect does not work for bytes I think it is indeed a
good solution to require users to specify BQ schemas as well when they
write to BQ

So new summary:
1. Beam will base64-encode raw bytes, before passing them to BQ over rest
API. This will be a change in behavior for Python 2 (for good reasons).
2. When reading data from BQ, all fields of type BYTES will be
base64-decoded.
3. Beam will send an API call to BigQuery to get table schema, whenever
schema is not supplied, to work around
https://issuetracker.google.com/issues/129006689. Beam will require users
to specify the schema when writing bytes to BQ.

Thanks all for your input on this!
Juta

Reply via email to