[jira] [Commented] (BEAM-11230) ReadFromBigQuery fails when the table has repeated records

Kamil Wasilewski (Jira) Fri, 11 Dec 2020 04:34:06 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-11230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247891#comment-17247891
 ]


Kamil Wasilewski commented on BEAM-11230:
-----------------------------------------

Thanks, I'll take care of that bug. In the meantime, you can pass 
`use_json_exports=False` parameter to your ReadFromBigQuery transform. With 
this parameter being False, the transform will export a BigQuery table to AVRO 
files instead of JSON files. That should work.

> ReadFromBigQuery fails when the table has repeated records
> ----------------------------------------------------------
>
>                 Key: BEAM-11230
>                 URL: https://issues.apache.org/jira/browse/BEAM-11230
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.25.0
>            Reporter: Alvaro
>            Assignee: Kamil Wasilewski
>            Priority: P2
>
> This is pretty much similar to the issue mentioned here: 
> https://issues.apache.org/jira/browse/BEAM-10524
> I've upgraded the python sdk version from 2.24 to 2.25 and the 
> ReadFromBigQuery start failing with this stacktrace:
>  
> {code:java}
> ....
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 
> 649, in do_work
>     work_executor.execute()
>   File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", 
> line 179, in execute
>     op.start()
>   File "dataflow_worker/native_operations.py", line 38, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 39, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 44, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File "dataflow_worker/native_operations.py", line 48, in 
> dataflow_worker.native_operations.NativeReadOperation.start
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", 
> line 89, in read
>     range_tracker.sub_range_tracker(source_ix)):
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", 
> line 210, in read_records
>     yield self._coder.decode(record)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 633, in decode
>     return self._decode_with_schema(value, self.fields)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 
> 656, in _decode_with_schema
>     value[field.name] = converter(value[field.name])
> TypeError: int() argument must be a string, a bytes-like object or a number, 
> not 'list'{code}
> According to the aforementioned issue, this should be fixed on the 2.25 but 
> it is actually the opposite in my case. 
> Code: 
> https://github.com/apache/beam/blob/release-2.25.0/sdks/python/apache_beam/io/gcp/bigquery.py#L656
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-11230) ReadFromBigQuery fails when the table has repeated records

Reply via email to