[ 
https://issues.apache.org/jira/browse/BEAM-7173?focusedWorklogId=237986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-237986
 ]

ASF GitHub Bot logged work on BEAM-7173:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/May/19 18:23
            Start Date: 06/May/19 18:23
    Worklog Time Spent: 10m 
      Work Description: pabloem commented on issue #8473: [BEAM-7173] Avoiding 
schema autodetection by default in WriteToBigQuery
URL: https://github.com/apache/beam/pull/8473#issuecomment-489723208
 
 
   Run Python PostCommit
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 237986)
    Time Spent: 1h 20m  (was: 1h 10m)

> Bigquery connector should not enable schema autodetection without a user 
> explicitly instructing to do so. 
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7173
>                 URL: https://issues.apache.org/jira/browse/BEAM-7173
>             Project: Beam
>          Issue Type: Bug
>          Components: io-python-gcp
>            Reporter: Valentyn Tymofieiev
>            Assignee: Pablo Estrada
>            Priority: Major
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently BQ_FILE_LOADS insertion method enables schema autodetection: 
> [https://github.com/apache/beam/blob/6567f1687d53e491b337ba94f521fa2e4af35e46/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L340]
>  It may be more user-friendly allow users to opt-in for schema autodetection 
> in their pipelines across all use-cases for BQ connector. Schema 
> autodetection is an approximation, and does not always work.
> For example, schema autodetection cannot infer whether a string data is 
> binary bytes or textual string, and will always prefer the latter. If schema 
> autodetection is enabled by default, users who need to write 'bytes' data 
> will always have to specify a schema, even when writing to a table that was 
> already created and has the schema. Otherwise autodetected schema will try to 
> write 'string' entry into a 'bytes' field and the write will fail.
> Related discussion: 
> [https://lists.apache.org/thread.html/1f9d9cb1bbbfca87d74e62ba8e58a15059ed6c20ab419002fcd3f8df@%3Cdev.beam.apache.org%3E]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to