This is an automated email from the ASF dual-hosted git repository. jackylk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push: new 063d9b2 [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus 063d9b2 is described below commit 063d9b2aff86f66f22ce75bc6905affc8a4bd8df Author: Zhangshunyu <zhangshunyu1...@126.com> AuthorDate: Thu Jul 9 11:23:39 2020 +0800 [CARBONDATA-3894] [IUD]decrease the size of tableupdatestaus file by remove the invalid segments not exist in tablestatus Why is this PR needed? tableupdatestatus file always keep the segments info even the compacted segment is deleted already,this will lead to the file size increase quickly, which is bad for performance. After this change, the tableupdatestatus file size can descrease from ~MB to ~KB What changes were proposed in this PR? Remove the invalid segments Does this PR introduce any user interface change? No Is any new testcase added? No This closes #3833 --- .../apache/carbondata/core/mutate/CarbonUpdateUtil.java | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java b/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java index e915c66..77ebf3e 100644 --- a/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java @@ -148,7 +148,21 @@ public class CarbonUpdateUtil { mergeSegmentUpdate(isCompaction, oldList, newBlockEntry); } - segmentUpdateStatusManager.writeLoadDetailsIntoFile(oldList, updateStatusFileIdentifier); + List<SegmentUpdateDetails> updateDetailsValidSeg = new ArrayList<>(); + Set<String> loadDetailsSet = new HashSet<>(); + for (LoadMetadataDetails details : segmentUpdateStatusManager.getLoadMetadataDetails()) { + loadDetailsSet.add(details.getLoadName()); + } + for (SegmentUpdateDetails updateDetails : oldList) { + if (loadDetailsSet.contains(updateDetails.getSegmentName())) { + // we should only keep the update info of segments in table status, especially after + // compaction and clean files some compacted segments will be removed. It can keep + // tableupdatestatus file in small size which is good for performance. + updateDetailsValidSeg.add(updateDetails); + } + } + segmentUpdateStatusManager + .writeLoadDetailsIntoFile(updateDetailsValidSeg, updateStatusFileIdentifier); status = true; } else { LOGGER.error("Not able to acquire the segment update lock.");