[
https://issues.apache.org/jira/browse/BEAM-12773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kenneth Knowles updated BEAM-12773:
-----------------------------------
Status: Open (was: Triage Needed)
> 404 Session not found, when querying Google Cloud Spanner with Python
> Dataflow.
> -------------------------------------------------------------------------------
>
> Key: BEAM-12773
> URL: https://issues.apache.org/jira/browse/BEAM-12773
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Affects Versions: 2.29.0, 2.33.0
> Reporter: Reto Egeter
> Priority: P3
> Attachments: dataflow_inprogress_2.29.0.png,
> dataflow_spanner_error_2.29.0.png, dataflow_spanner_error_2.33.0.png
>
>
> My Dataflow copies a SQL table with 230M rows into Cloud Spanner. The initial
> run is successful, but any subsequent run fails with this error.
> "h1.google.api_core.exceptions.NotFound: 404 Session not found"
> and also "504 Deadline Exceeded"
> Here is part of the code:
> {code:python}
> SPANNER_QUERY = 'SELECT row_id, update_key FROM DomainsCluster2'
> spanner_domains = (
> p
> | 'ReadFromSpanner' >> ReadFromSpanner(
> project_id, database, database, sql=SPANNER_QUERY)
> | 'KeyDomainsSpanner' >> beam.Map(_KeyDomainSpanner))
> def _KeyDomainSpanner(entity):
> row = {}
> for i, column in enumerate(['row_id', 'update_key']):
> row[column] = entity[i]
> return row['row_id'], row
> {code}
> The Dataflow job is able to read around 10M rows with 2.29.0 but only a few
> thousand with 2.33.0
--
This message was sent by Atlassian Jira
(v8.20.1#820001)