[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3935: [CARBONDATA-3993] Remove deletePartialLoadData in data loading&compaction process

GitBox Fri, 18 Sep 2020 00:40:29 -0700


ajantha-bhat commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r490760218




##########
File path: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -267,9 +266,8 @@ object CarbonDataRDDFactory {
             throw new Exception("Exception in compaction " + 
exception.getMessage)
           }
         } finally {
-          executor.shutdownNow()
           try {
-            compactor.deletePartialLoadsInCompaction()

Review comment:
       a) When compaction retries, it uses the same segment ID, if stale files 
are not cleaned. It gives duplicate data.
   So, before this change, we need #3934 to be merged which can use a unique 
segment id for compaction retry.
   
   b) please check and move the logic of deletePartialLoadsInCompaction in 
clean files command, instead of permanently removing it. If the clean files 
don't have this logic, it may not able to clean stale files.
   
   c) Also if the purpose of this PR is to avoid accidental data loss. you need 
to handle `cleanStaleDeltaFiles` in `CarbonUpdateUtil.java` and also identify 
other places. Just handling in few place will not guarantee that we cannot have 
data loss.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3935: [CARBONDATA-3993] Remove deletePartialLoadData in data loading&compaction process

Reply via email to