[GitHub] [carbondata] manishgupta88 commented on a change in pull request #3148: [CARBONDATA-3293] Prune datamaps improvement for count(*)

GitBox Mon, 18 Mar 2019 09:46:43 -0700

manishgupta88 commented on a change in pull request #3148: [CARBONDATA-3293] 
Prune datamaps improvement for count(*)
URL: https://github.com/apache/carbondata/pull/3148#discussion_r266535183


 ##########
 File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ##########
 @@ -624,36 +624,43 @@ public BlockMappingVO getBlockRowCount(Job job, 
CarbonTable table,
           .clearInvalidSegments(getOrCreateCarbonTable(job.getConfiguration()),
               toBeCleanedSegments);
     }
-    List<ExtendedBlocklet> blocklets =
-        blockletMap.prune(filteredSegment, (FilterResolverIntf) null, 
partitions);
-    for (ExtendedBlocklet blocklet : blocklets) {
-      String blockName = blocklet.getPath();
-      blockName = CarbonTablePath.getCarbonDataFileName(blockName);
-      blockName = blockName + CarbonTablePath.getCarbonDataExtension();
-
-      long rowCount = blocklet.getDetailInfo().getRowCount();
-
-      String segmentId = 
Segment.toSegment(blocklet.getSegmentId()).getSegmentNo();
-      String key = CarbonUpdateUtil.getSegmentBlockNameKey(segmentId, 
blockName);
-
-      // if block is invalid then don't add the count
-      SegmentUpdateDetails details = 
updateStatusManager.getDetailsForABlock(key);
-
-      if (null == details || 
!CarbonUpdateUtil.isBlockInvalid(details.getSegmentStatus())) {
-        Long blockCount = blockRowCountMapping.get(key);
-        if (blockCount == null) {
-          blockCount = 0L;
-          Long count = segmentAndBlockCountMapping.get(segmentId);
-          if (count == null) {
-            count = 0L;
+    if (isIUDTable || isUpdateFlow) {
+      Map<String, Long> blockletToRowCountMap =
+          defaultDataMap.getBlockRowCount(filteredSegment, partitions, 
defaultDataMap);
+      // key is the (segmentId","+blockletPath) and key is the row count of 
that blocklet
+      for (Map.Entry<String, Long> eachBlocklet : 
blockletToRowCountMap.entrySet()) {
+        String[] segmentIdAndPath = eachBlocklet.getKey().split(",", 2);
+        String segmentId = segmentIdAndPath[0];
+        String blockName = segmentIdAndPath[1];
+        blockName = CarbonTablePath.getCarbonDataFileName(blockName);
+        blockName = blockName + CarbonTablePath.getCarbonDataExtension();
 
 Review comment:
   I dont see filepath getting used so while filling in the map you can fill 
only the fileName with extension. So here these duplicate operations can be 
removed. While filling you in the map you can put
   `segmentId + '_' + new String(dataMapRow.getByteArray(FILE_PATH_INDEX), 
CarbonCommonConstants.DEFAULT_CHARSET_CLASS)+ 
CarbonTablePath.getCarbonDataExtension()`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [carbondata] manishgupta88 commented on a change in pull request #3148: [CARBONDATA-3293] Prune datamaps improvement for count(*)

Reply via email to