[
https://issues.apache.org/jira/browse/SPARK-31931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim updated SPARK-31931:
---------------------------------
Priority: Major (was: Blocker)
> When using GCS as checkpoint location for Structured Streaming aggregation
> pipeline, the Spark writing job is aborted
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-31931
> URL: https://issues.apache.org/jira/browse/SPARK-31931
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.4.5
> Environment: GCP Dataproc 1.5 Debian 10 (Hadoop 2.10.0, Spark 2.4.5,
> Cloud Storage Connector hadoop2.2.1.3, Scala 2.12.10)
> Reporter: Adrian Jones
> Priority: Major
> Attachments: spark-structured-streaming-error
>
>
> Structured streaming checkpointing does not work with Google Cloud Storage
> when there are aggregations included in the streaming pipeline.
> Using GCS as the external store works fine when there are no aggregations
> present in the pipeline (i.e. groupBy); however, once an aggregation is
> introduced, the attached error is thrown.
> The error is only thrown when aggregating and pointing checkpointLocation to
> GCS. The exact code works fine when pointing checkpointLocation to HDFS.
> Is it expected for GCS to function as a checkpoint location for aggregated
> pipelines? Are efforts currently in progress to enable this? Is it on a
> roadmap?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]