[
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394526
]
ASF GitHub Bot logged work on BEAM-8841:
----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Feb/20 23:24
Start Date: 27/Feb/20 23:24
Worklog Time Spent: 10m
Work Description: pabloem commented on pull request #10979: [BEAM-8841]
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r384769744
##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -1361,87 +1369,18 @@ def __init__(
self.triggering_frequency = triggering_frequency
self.insert_retry_strategy = insert_retry_strategy
self._validate = validate
+ self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON
Review comment:
I'm happy to make AVRO the default format if possible. I guess the issue is
that users need to provide the schema, right? Otherwise we cannot write the
avro files.
We could make AVRO the default, and add a check that the schema was provided
(i.e. is neither None nor autodetect) - and error out if that's the case? What
do you think?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 394526)
Time Spent: 2h 40m (was: 2.5h)
> Add ability to perform BigQuery file loads using avro
> -----------------------------------------------------
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
> Issue Type: Improvement
> Components: io-py-gcp
> Reporter: Chun Yang
> Assignee: Chun Yang
> Priority: Minor
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python
> SDK. JSON has some disadvantages including size of serialized data and
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these
> disadvantages. The Java SDK already supports loading files using avro format
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere aroundÂ
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)