[
https://issues.apache.org/jira/browse/BEAM-7173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838711#comment-16838711
]
Pablo Estrada commented on BEAM-7173:
-------------------------------------
I don't think this needs to be in 2.13 because the BQ File Loads feature is
marked as experimental, and not the default ATM. To actvate, users need to pass
an experiment flag.
> Bigquery connector should not enable schema autodetection without a user
> explicitly instructing to do so.
> ----------------------------------------------------------------------------------------------------------
>
> Key: BEAM-7173
> URL: https://issues.apache.org/jira/browse/BEAM-7173
> Project: Beam
> Issue Type: Bug
> Components: io-python-gcp
> Reporter: Valentyn Tymofieiev
> Assignee: Pablo Estrada
> Priority: Major
> Fix For: 2.13.0
>
> Time Spent: 3.5h
> Remaining Estimate: 0h
>
> Currently BQ_FILE_LOADS insertion method enables schema autodetection:
> [https://github.com/apache/beam/blob/6567f1687d53e491b337ba94f521fa2e4af35e46/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L340]
> It may be more user-friendly allow users to opt-in for schema autodetection
> in their pipelines across all use-cases for BQ connector. Schema
> autodetection is an approximation, and does not always work.
> For example, schema autodetection cannot infer whether a string data is
> binary bytes or textual string, and will always prefer the latter. If schema
> autodetection is enabled by default, users who need to write 'bytes' data
> will always have to specify a schema, even when writing to a table that was
> already created and has the schema. Otherwise autodetected schema will try to
> write 'string' entry into a 'bytes' field and the write will fail.
> Related discussion:
> [https://lists.apache.org/thread.html/1f9d9cb1bbbfca87d74e62ba8e58a15059ed6c20ab419002fcd3f8df@%3Cdev.beam.apache.org%3E]
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)