Kamil Wasilewski created BEAM-8528:
--------------------------------------

             Summary: BigQuery bounded source does not work on DirectRunner
                 Key: BEAM-8528
                 URL: https://issues.apache.org/jira/browse/BEAM-8528
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
            Reporter: Kamil Wasilewski


{code:java}
  File "/home/Kamil/projects/beam/sdks/python/apache_beam/io/gcp/bigquery.py", 
line 639, in get_range_tracker
    raise NotImplementedError('BigQuery source must be split before being read')
NotImplementedError: BigQuery source must be split before being read
{code}
 

_get_range_tracker_ and _read_ methods aren't implemented in __BigQuerySource_. 
This is purposeful — the runner is expected to call _split_ instead. The Java 
implementation works the same way: 
[link|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySourceBase.java]

It seems that DataflowRunner and Flink are able to catch these exceptions 
somehow, while DirectRunner is not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to