This is an automated email from the ASF dual-hosted git repository.
jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push:
new 533ec04 [CARBONDATA-3570] Change task number to jobid+taskid for
FileFormat
533ec04 is described below
commit 533ec04e11a932ea9fcc70d8e3566c5b77e09da8
Author: ajantha-bhat <[email protected]>
AuthorDate: Mon Nov 4 20:19:01 2019 +0530
[CARBONDATA-3570] Change task number to jobid+taskid for FileFormat
problem : Incase of File format, task retry is having different task
number, so file name is becoming different
cause : Everytime in executor, UUID is used to generate task number for
carbondata file name. so if previous task has copied few files insted of all
files and it got crashed. New task will create new file instead of overwriting
existing file
solution: make task number in file name same for task retry by using taskid
+ job number
Before:
edff9ef0cd66484ebbaff6ad683f0f62_batchno0-0-null-88487116477262.carbonindex
After:
2019110420310511x0_batchno0-0-null-14565827943299.carbonindex
This closes #3433
---
.../sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
index 25296a0..1556b46 100644
---
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
+++
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
@@ -154,8 +154,11 @@ class SparkCarbonFileFormat extends FileFormat
path
}
context.getConfiguration.set("carbon.outputformat.writepath",
updatedPath)
+ // "jobid"+"x"+"taskid", task retry should have same task number
context.getConfiguration.set("carbon.outputformat.taskno",
- UUID.randomUUID().toString.replace("-", ""))
+ context.getTaskAttemptID.getJobID.getJtIdentifier +
+ context.getTaskAttemptID.getJobID.getId
+ + 'x' + context.getTaskAttemptID.getTaskID.getId)
new CarbonOutputWriter(path, context, dataSchema.fields)
}