ajantha-bhat commented on a change in pull request #3935:
URL: https://github.com/apache/carbondata/pull/3935#discussion_r490760218
##########
File path:
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -267,9 +266,8 @@ object CarbonDataRDDFactory {
throw new Exception("Exception in compaction " +
exception.getMessage)
}
} finally {
- executor.shutdownNow()
try {
- compactor.deletePartialLoadsInCompaction()
Review comment:
a) When compaction retries, it uses the same segment ID, if stale files
are not cleaned. It gives duplicate data.
So, before this change, we need #3934 to be merged which can use a unique
segment id for compaction retry.
b) please check and move the logic of deletePartialLoadsInCompaction in
clean files command, instead of permanently removing it. If the clean files
don't have this logic, it may not able to clean stale files.
c) Also if the purpose of this PR is to avoid accidental data loss. you need
to handle `cleanStaleDeltaFiles` in `CarbonUpdateUtil.java` and also identify
other places. Just handling in few place will not guarantee that we cannot have
data loss.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]