[
https://issues.apache.org/jira/browse/BEAM-12773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443148#comment-17443148
]
Beam JIRA Bot commented on BEAM-12773:
--------------------------------------
This issue is P2 but has been unassigned without any comment for 60 days so it
has been labeled "stale-P2". If this issue is still affecting you, we care!
Please comment and remove the label. Otherwise, in 14 days the issue will be
moved to P3.
Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed
explanation of what these priorities mean.
> 404 Session not found, when querying Google Cloud Spanner with Python
> Dataflow.
> -------------------------------------------------------------------------------
>
> Key: BEAM-12773
> URL: https://issues.apache.org/jira/browse/BEAM-12773
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Affects Versions: 2.29.0, 2.33.0
> Reporter: Reto Egeter
> Priority: P2
> Labels: stale-P2
> Attachments: dataflow_inprogress_2.29.0.png,
> dataflow_spanner_error_2.29.0.png, dataflow_spanner_error_2.33.0.png
>
>
> My Dataflow copies a SQL table with 230M rows into Cloud Spanner. The initial
> run is successful, but any subsequent run fails with this error.
> "h1.google.api_core.exceptions.NotFound: 404 Session not found"
> and also "504 Deadline Exceeded"
> Here is part of the code:
> {code:python}
> SPANNER_QUERY = 'SELECT row_id, update_key FROM DomainsCluster2'
> spanner_domains = (
> p
> | 'ReadFromSpanner' >> ReadFromSpanner(
> project_id, database, database, sql=SPANNER_QUERY)
> | 'KeyDomainsSpanner' >> beam.Map(_KeyDomainSpanner))
> def _KeyDomainSpanner(entity):
> row = {}
> for i, column in enumerate(['row_id', 'update_key']):
> row[column] = entity[i]
> return row['row_id'], row
> {code}
> The Dataflow job is able to read around 10M rows with 2.29.0 but only a few
> thousand with 2.33.0
--
This message was sent by Atlassian Jira
(v8.20.1#820001)