This is an automated email from the ASF dual-hosted git repository.

jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new 533ec04  [CARBONDATA-3570] Change task number to jobid+taskid for 
FileFormat
533ec04 is described below

commit 533ec04e11a932ea9fcc70d8e3566c5b77e09da8
Author: ajantha-bhat <[email protected]>
AuthorDate: Mon Nov 4 20:19:01 2019 +0530

    [CARBONDATA-3570] Change task number to jobid+taskid for FileFormat
    
    problem :  Incase of File format, task retry is having different task 
number, so file name is becoming different
    
    cause : Everytime in executor, UUID is used to generate task number for 
carbondata file name. so if previous task has copied few files insted of all 
files and it got crashed. New task will create new file instead of overwriting 
existing file
    
    solution: make task number in file name same for task retry by using taskid 
+ job number
    
    Before:
    edff9ef0cd66484ebbaff6ad683f0f62_batchno0-0-null-88487116477262.carbonindex
    After:
    2019110420310511x0_batchno0-0-null-14565827943299.carbonindex
    
    This closes #3433
---
 .../sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
index 25296a0..1556b46 100644
--- 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
+++ 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
@@ -154,8 +154,11 @@ class SparkCarbonFileFormat extends FileFormat
           path
         }
         context.getConfiguration.set("carbon.outputformat.writepath", 
updatedPath)
+        // "jobid"+"x"+"taskid", task retry should have same task number
         context.getConfiguration.set("carbon.outputformat.taskno",
-          UUID.randomUUID().toString.replace("-", ""))
+          context.getTaskAttemptID.getJobID.getJtIdentifier +
+          context.getTaskAttemptID.getJobID.getId
+          + 'x' + context.getTaskAttemptID.getTaskID.getId)
         new CarbonOutputWriter(path, context, dataSchema.fields)
       }
 

Reply via email to