[GitHub] [carbondata] dhatchayani commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

GitBox Thu, 26 Mar 2020 03:38:15 -0700

dhatchayani commented on a change in pull request #3676: [WIP]Clean up the data 
file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#discussion_r398470227


 ##########
 File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/CarbonSIRebuildRDD.scala
 ##########
 @@ -321,6 +324,26 @@ class CarbonSIRebuildRDD[K, V](
           LOGGER.info("Closing compaction processor instance to clean up 
loading resources")
           processor.close()
         }
+
+        // delete all the old data files which are used for merging
+        splits.asScala.foreach { split =>
+          val carbonFile = FileFactory.getCarbonFile(split.getFilePath)
+          carbonFile.delete()
+        }
+
+        // delete the indexfile/merge index carbonFile of old data files
+        val segmentPath = 
FileFactory.getCarbonFile(indexTable.getSegmentPath(segmentId))
+        val indexFiles = segmentPath.listFiles(new CarbonFileFilter {
+          override def accept(carbonFile: CarbonFile): Boolean = {
+            (carbonFile.getName.endsWith(CarbonTablePath.INDEX_FILE_EXT) ||
+             
carbonFile.getName.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)) &&
+            
DataFileUtil.getTimeStampFromFileName(carbonFile.getAbsolutePath).toLong <
+            carbonLoadModelCopy.getFactTimeStamp
+          }
+        })
+        indexFiles.foreach { indexFile =>
+          indexFile.delete()
 
 Review comment:
   Please test the scenario when
   (1) the index files before rebuild is already queried and cached
   (2) then rebuild and query are concurrent
   in this scenario query will take the index file and go on query, but if the 
rebuild deletes it, then the file will be unavailable and either says exception 
or will result in null set.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [carbondata] dhatchayani commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

Reply via email to