Valentyn Tymofieiev created BEAM-7173:
-----------------------------------------

             Summary: Bigquery connector should not enable schema without a 
user explicitly instructing to do so. 
                 Key: BEAM-7173
                 URL: https://issues.apache.org/jira/browse/BEAM-7173
             Project: Beam
          Issue Type: Bug
          Components: io-python-gcp
            Reporter: Valentyn Tymofieiev
            Assignee: Pablo Estrada


Currently BQ_FILE_LOADS insertion method enables schema autodetection: 
[https://github.com/apache/beam/blob/6567f1687d53e491b337ba94f521fa2e4af35e46/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L340]

 It may be more user-friendly allow users to opt-in for schema autodetection in 
their pipelines across all use-cases for BQ connector. Schema autodetection is 
an approximation, and does not always work.

For example, schema autodetection cannot infer whether a string data is binary 
bytes or textual string, and will always prefer the latter. If schema 
autodetection is enabled by default, users who need to write 'bytes' data will 
always have to specify a schema, even when writing to a table that was already 
created and has the schema. Otherwise autodetected schema will try to write 
'string' entry into a 'bytes' field and the write will fail.

Related discussion: 
[https://lists.apache.org/thread.html/1f9d9cb1bbbfca87d74e62ba8e58a15059ed6c20ab419002fcd3f8df@%3Cdev.beam.apache.org%3E]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to