On Mon, Mar 25, 2019 at 2:16 PM Pablo Estrada <[email protected]> wrote:
> +Chamikara Jayalath <[email protected]> with the new BigQuery sink, > schema autodetection is supported (it's a very simple thing to have). Do > you think we should not have it? > Best > -P. > Ah good to know. But IMO users should be able to write to existing tables without specifying a schema (when CEATE_DISPOSITION is CREATE_NEVER for example). How do users enable schema auto-detection ? Probably this should not be enabled by default and we should clearly advertise that bytes type is not supported (or support it with extra information). Just my 2 cents. Thanks, Cham > > On Mon, Mar 25, 2019 at 11:01 AM Chamikara Jayalath <[email protected]> > wrote: > >> >> >> On Mon, Mar 25, 2019 at 2:03 AM Juta Staes <[email protected]> wrote: >> >>> >>> On Mon, 25 Mar 2019 at 06:15, Valentyn Tymofieiev <[email protected]> >>> wrote: >>> >>>> We received feedback on >>>> https://issuetracker.google.com/issues/129006689 - BQ developers say >>>> that schema identification is done and they discourage to use schema >>>> autodetection in tables using BYTES. In light of this, I think may be fair >>>> to recommend Beam users to specify BQ schemas as well when they interact >>>> with BQ, and call out that writing binary data to BQ will likely fail >>>> unless schema is specified. Does that make sense? >>>> >>> >>> Given that schema autodetect does not work for bytes I think it is >>> indeed a good solution to require users to specify BQ schemas as well when >>> they write to BQ >>> >>> So new summary: >>> 1. Beam will base64-encode raw bytes, before passing them to BQ over >>> rest API. This will be a change in behavior for Python 2 (for good reasons). >>> 2. When reading data from BQ, all fields of type BYTES will be >>> base64-decoded. >>> 3. Beam will send an API call to BigQuery to get table schema, whenever >>> schema is not supplied, to work around >>> https://issuetracker.google.com/issues/129006689. Beam will require >>> users to specify the schema when writing bytes to BQ. >>> >> >> I'm not sure why we reached this conclusion. We (Beam) does not use BQ >> schema auto detection feature currently. So why not just send an API >> signal to get the schema when users are writing to existing tables ? Also, >> even if we decide to support schema auto detection in the future we will >> not be able to support this for BYTEs type (due to the restriction by BQ). >> >> >>> Thanks all for your input on this! >>> Juta >>> >>>
