satybald edited a comment on pull request #15185:
URL: https://github.com/apache/beam/pull/15185#issuecomment-906714140


   > With that said, if your proposal is that we should provide a single, 
centrally-maintained compat layer that customers can make use of, that makes 
sense to me. The DATETIME issue is particularly galling as an end user -- let's 
start there
   
   My proposal here to strive for simplicity and user adoption. I don't see a 
point having one public ReadFromBigQuery(RFBQ) API that parses `DATETIME` or 
any other types in three different ways. From the customer perspective this is 
nightmare given that RFBQ is usually a source of the pipeline. If the user 
decided to adopt a BQ Storage Read instead of Avro export, s/he might need to 
test and modify pipeline in different places. 
   
   > Seems like you've run into issues even before this PR was merged
   
   I tested early this PR and have a plan about switching some of the pipelines 
but after reading further the docs, I found that it would be pretty hard task 
to migrate an existing pipeline from `EXPORT` to `DIRECT_READ`.
   
   
   Here's the quotes from the documentation that specify the issue:
   
   
   >   .. warning::
   >       DATETIME columns are parsed as strings in the fastavro library. As a
   >       result, such columns will be converted to Python strings instead of 
native
   >       Python DATETIME type
   
   
https://github.com/apache/beam/blob/7cad244c2c668cb92d84c5e9b951a0dbffae5017/sdks/python/apache_beam/io/gcp/bigquery.py#L2132
   
   
   > When using JSON exports, the BigQuery types for DATE, DATETIME, TIME, and
   >       TIMESTAMP will be exported as strings. This behavior is consistent 
with
   >       BigQuerySource.
   >       When using Avro exports, these fields will be exported as native 
Python
   >       types (datetime.date, datetime.datetime, datetime.datetime,
   >       and datetime.datetime respectively). Avro exports are recommended.
   
   
https://github.com/apache/beam/blob/7cad244c2c668cb92d84c5e9b951a0dbffae5017/sdks/python/apache_beam/io/gcp/bigquery.py#L2192
   
   
   My shallow understanding of this situation, that the issue mainly how 
`python-bigquery-storage` parses `DATETIME` i.e. [1] rather than `fastavro`.
   
   [1] 
https://github.com/googleapis/python-bigquery-storage/blob/master/google/cloud/bigquery_storage_v1/reader.py#L660-L661
     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to