On Mon, 25 Mar 2019 at 06:15, Valentyn Tymofieiev <valen...@google.com> wrote:
> We received feedback on https://issuetracker.google.com/issues/129006689 - > BQ developers say that schema identification is done and they discourage to > use schema autodetection in tables using BYTES. In light of this, I think > may be fair to recommend Beam users to specify BQ schemas as well when they > interact with BQ, and call out that writing binary data to BQ will likely > fail unless schema is specified. Does that make sense? > Given that schema autodetect does not work for bytes I think it is indeed a good solution to require users to specify BQ schemas as well when they write to BQ So new summary: 1. Beam will base64-encode raw bytes, before passing them to BQ over rest API. This will be a change in behavior for Python 2 (for good reasons). 2. When reading data from BQ, all fields of type BYTES will be base64-decoded. 3. Beam will send an API call to BigQuery to get table schema, whenever schema is not supplied, to work around https://issuetracker.google.com/issues/129006689. Beam will require users to specify the schema when writing bytes to BQ. Thanks all for your input on this! Juta