satybald commented on pull request #15185:
URL: https://github.com/apache/beam/pull/15185#issuecomment-906714140
> With that said, if your proposal is that we should provide a single,
centrally-maintained compat layer that customers can make use of, that makes
sense to me. The DATETIME issue is particularly galling as an end user -- let's
start there
My proposal here to strive for simplicity and user adoption. I don't see a
point having one public ReadFromBigQuery(RFBQ) API that parses `DATETIME` or
any other types in three different ways. From the customer perspective this is
nightmare given that RFBQ is usually a source of the pipeline. If the user
decided to adopt a BQ Storage Read instead of Avro export, s/he might need to
test and modify pipeline in different places.
> Seems like you've run into issues even before this PR was merged
I tested early this PR and have a plan about switching some of the pipelines
but after reading further the docs, I found that it would be pretty hard task
to migrate an existing pipeline from `EXPORT` to `DIRECT_READ`.
Here's the quotes from the documentation that specify the issue:
.. warning::
DATETIME columns are parsed as strings in the fastavro library. As a
result, such columns will be converted to Python strings instead of
native
Python DATETIME type
https://github.com/apache/beam/blob/7cad244c2c668cb92d84c5e9b951a0dbffae5017/sdks/python/apache_beam/io/gcp/bigquery.py#L2132
When using JSON exports, the BigQuery types for DATE, DATETIME, TIME, and
TIMESTAMP will be exported as strings. This behavior is consistent with
BigQuerySource.
When using Avro exports, these fields will be exported as native Python
types (datetime.date, datetime.datetime, datetime.datetime,
and datetime.datetime respectively). Avro exports are recommended.
https://github.com/apache/beam/blob/7cad244c2c668cb92d84c5e9b951a0dbffae5017/sdks/python/apache_beam/io/gcp/bigquery.py#L2192
My shallow understanding of this situation, that the issue mainly how
`python-bigquery-storage` parses `DATETIME` i.e. [source
code](https://github.com/googleapis/python-bigquery-storage/blob/master/google/cloud/bigquery_storage_v1/reader.py#L660-L661)
rather than fastavro.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]