akashrn5 commented on a change in pull request #3677: [wip]Fix segment cache 
issue with parallel spark applications on same store
URL: https://github.com/apache/carbondata/pull/3677#discussion_r397343014
 
 

 ##########
 File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java
 ##########
 @@ -369,6 +379,33 @@ private void modifyColumnSchemaForSortColumn(ColumnSchema 
columnSchema, boolean
     return tableBlockIndexUniqueIdentifiers;
   }
 
+  /**
+   * This case is added for a case where, there are two applications running, 
and in one application
+   * operations happened like SI rebuild, update or delete case, then the 
cache should be updated as
+   * well. The cache updation happens for same application, but other 
application may fail to query
+   * or may give wrong result. Since we overwrite the segment file in these 
scenarios, check the
+   * timestamp, and if modified, clear from the cache.
+   */
+  private void clearSegmentMapIfSegmentUpdated(String latestSegmentFilePath, 
Segment segment) {
+    SegmentBlockIndexInfo segmentBlockIndexInfo = 
segmentMap.get(segment.getSegmentNo());
 
 Review comment:
   this was done for non transactional table, in transactional case, we have 
scenarios where we have millions of files inside a segment, so i think this 
approach will be better with respect to performance, and in future all 
operations will be based on segment file, so i think it will be feasible. what 
u think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to