[ https://issues.apache.org/jira/browse/SPARK-26647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grid updated SPARK-26647: ------------------------- Environment: Spark Kubernetes {{2.4.0}} on gke {{1.11.5-gke.5}} was: Using spark kubernetes {{2.4.0}} on gke {{1.11.5-gke.5}} Description: When using Spark on Kubernetes and the latest jar {{[https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar]}} (dont' know what version this corresponds to) I have a spark job that writes about 10GB of data to GCS using DataFrame write df .write.json(path_to_gcs_bucket) This job and stage completes reports as complete but I can still see part files being written in the background: {{gs://mybucket/output/ZGM0YTg3Nzk2NDEwY2ViY2FhNTYwZTZi/part-00124-e86f3a48-72f7-4bf7-bdc4-328e97cdc7b1-c000.json}} The job is marked as success but there are still gcs writes going on in the background. This should update/report to the the job stage correctly and not be marked as {{success}}. Once the writes have completed the spark context stop() is encountered and the job terminated. using spark kubernetes {{2.4.0}} on gke {{1.11.5-gke.5}} was: When using Spark on Kubernetes and the latest jar {{[https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar]}} (dont' know what version this corresponds to) I have a spark job that writes about 10GB of data to GCS using DataFrame write df .write.json(path_to_gcs_bucket) This job and stage completes reports as complete but I can still see part files being written in the background: {{gs://mybucket/output/ZGM0YTg3Nzk2NDEwY2ViY2FhNTYwZTZi/part-00124-e86f3a48-72f7-4bf7-bdc4-328e97cdc7b1-c000.json}} The job is marked as success but there are still gcs writes going on in the background. This should update/report to the the job stage correctly and not be marked as {{success}}. Once the writes have completed the spark context\{{ stop()}} is encountered and the job terminated. using spark kubernetes {{2.4.0}} on gke {{1.11.5-gke.5}} > Spark Job marked as success when data is still being written to GCS > ------------------------------------------------------------------- > > Key: SPARK-26647 > URL: https://issues.apache.org/jira/browse/SPARK-26647 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.0 > Environment: Spark Kubernetes {{2.4.0}} on gke {{1.11.5-gke.5}} > > Reporter: Grid > Priority: Major > Attachments: 51244468-1971b700-197d-11e9-9682-f021f1bc64e7.png > > > When using Spark on Kubernetes and the latest jar > {{[https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar]}} > (dont' know what version this corresponds to) > I have a spark job that writes about 10GB of data to GCS using DataFrame write > df .write.json(path_to_gcs_bucket) > This job and stage completes reports as complete but I can still see part > files being written in the background: > {{gs://mybucket/output/ZGM0YTg3Nzk2NDEwY2ViY2FhNTYwZTZi/part-00124-e86f3a48-72f7-4bf7-bdc4-328e97cdc7b1-c000.json}} > The job is marked as success but there are still gcs writes going on in the > background. This should update/report to the the job stage correctly and not > be marked as {{success}}. > Once the writes have completed the spark context stop() is encountered and > the job terminated. > using spark kubernetes {{2.4.0}} on gke {{1.11.5-gke.5}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org