steveloughran edited a comment on pull request #2399: URL: https://github.com/apache/hadoop/pull/2399#issuecomment-724222044
Running integration tests on this with spark + patch and the 3.4.0-SNAPSHOT builds. Ignoring compilation issues with spark trunk, hadoop-trunk, scala versions and scalatest, I'm running tests in [cloud-integration](https://github.com/hortonworks-spark/cloud-integration) ``` S3AParquetPartitionSuite: 2020-11-09 10:55:36,664 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3ACommitter (AbstractS3ACommitter.java:<init>(180)) - Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID 2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO output.FileOutputCommitter (FileOutputCommitter.java:<init>(141)) - File Output Committer Algorithm version is 1 2020-11-09 10:55:36,733 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO output.FileOutputCommitter (FileOutputCommitter.java:<init>(156)) - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3ACommitterFactory (S3ACommitterFactory.java:createTaskCommitter(83)) - Using committer directory to output data to s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo 2020-11-09 10:55:36,734 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3ACommitterFactory (AbstractS3ACommitterFactory.java:createOutputCommitter(54)) - Using Committer StagingCommitter{AbstractS3ACommitter{role=Task committer attempt_20201109105536_0000_m_000000_0, name=directory, outputPath=s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo, workPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536_0000_m_000000_0/_temporary/0/_temporary/attempt_20201109105536_0000_m_000000_0, uuid='d6b6cd70-0303-46a6-8ff4-240dd14511d6', uuid source=JobUUIDSource{text='spark.sql.sources.writeJobUUID'}}, commitsDirectory=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, uniqueFilenames=true, conflictResolution=APPEND. uploadPartS ize=67108864, wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job_20201109105536_0000}; taskId=attempt_20201109105536_0000_m_000000_0, status=''}; org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@759c53e5}; outputPath=file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads, workPath=null, algorithmVersion=1, skipCleanup=false, ignoreCleanupFailures=false}} for s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3AParquetPartitionSuite/part-columns/p1=1/p2=foo 2020-11-09 10:55:36,736 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO staging.DirectoryStagingCommitter (DirectoryStagingCommitter.java:setupJob(71)) - Conflict Resolution mode is APPEND 2020-11-09 10:55:36,879 [ScalaTest-main-running-S3AParquetPartitionSuite] INFO commit.AbstractS3AC ``` 1. Spark is passing down a unique job ID (committer is configured to require it) ` Job UUID d6b6cd70-0303-46a6-8ff4-240dd14511d6 source spark.sql.sources.writeJobUUID` 1. This used for the local fs work path of the staging committer `file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/target/test/s3a/d6b6cd70-0303-46a6-8ff4-240dd14511d6-attempt_20201109105536_0000_m_000000_0/_temporary/0/_temporary/attempt_20201109105536_0000_m_000000_0,` 1. And for the cluster FS (which is file:// here) `file:/Users/stevel/Projects/sparkwork/cloud-integration/cloud-examples/tmp/staging/stevel/d6b6cd70-0303-46a6-8ff4-240dd14511d6/staging-uploads` that is: spark is setting the UUID and the committer is picking it up and using as appropriate ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
