Valentyn Tymofieiev created BEAM-7173:
-----------------------------------------
Summary: Bigquery connector should not enable schema without a
user explicitly instructing to do so.
Key: BEAM-7173
URL: https://issues.apache.org/jira/browse/BEAM-7173
Project: Beam
Issue Type: Bug
Components: io-python-gcp
Reporter: Valentyn Tymofieiev
Assignee: Pablo Estrada
Currently BQ_FILE_LOADS insertion method enables schema autodetection:
[https://github.com/apache/beam/blob/6567f1687d53e491b337ba94f521fa2e4af35e46/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L340]
It may be more user-friendly allow users to opt-in for schema autodetection in
their pipelines across all use-cases for BQ connector. Schema autodetection is
an approximation, and does not always work.
For example, schema autodetection cannot infer whether a string data is binary
bytes or textual string, and will always prefer the latter. If schema
autodetection is enabled by default, users who need to write 'bytes' data will
always have to specify a schema, even when writing to a table that was already
created and has the schema. Otherwise autodetected schema will try to write
'string' entry into a 'bytes' field and the write will fail.
Related discussion:
[https://lists.apache.org/thread.html/1f9d9cb1bbbfca87d74e62ba8e58a15059ed6c20ab419002fcd3f8df@%3Cdev.beam.apache.org%3E]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)