[
https://issues.apache.org/jira/browse/SPARK-31931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17128705#comment-17128705
]
Jungtaek Lim commented on SPARK-31931:
--------------------------------------
Critical+ is reserved for committers. Lowering the priority.
The checkpoint mechanism uses atomic rename by default which may not be
supported by object stores. Known unsupported one is S3, and I guess GCS might
be just another one. I feel it's a kind of "good to have", instead of
"essential" one.
Maybe you can get better answer from user mailing list, instead of filing the
issue. As you're using GCP, consulting with Google might be the best try.
> When using GCS as checkpoint location for Structured Streaming aggregation
> pipeline, the Spark writing job is aborted
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-31931
> URL: https://issues.apache.org/jira/browse/SPARK-31931
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.4.5
> Environment: GCP Dataproc 1.5 Debian 10 (Hadoop 2.10.0, Spark 2.4.5,
> Cloud Storage Connector hadoop2.2.1.3, Scala 2.12.10)
> Reporter: Adrian Jones
> Priority: Major
> Attachments: spark-structured-streaming-error
>
>
> Structured streaming checkpointing does not work with Google Cloud Storage
> when there are aggregations included in the streaming pipeline.
> Using GCS as the external store works fine when there are no aggregations
> present in the pipeline (i.e. groupBy); however, once an aggregation is
> introduced, the attached error is thrown.
> The error is only thrown when aggregating and pointing checkpointLocation to
> GCS. The exact code works fine when pointing checkpointLocation to HDFS.
> Is it expected for GCS to function as a checkpoint location for aggregated
> pipelines? Are efforts currently in progress to enable this? Is it on a
> roadmap?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]