ajantha-bhat commented on a change in pull request #3999: URL: https://github.com/apache/carbondata/pull/3999#discussion_r515759986
########## File path: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ########## @@ -808,19 +808,6 @@ private CarbonCommonConstants() { */ public static final int NUMBER_OF_SEGMENT_COMPACTED_PERTIME_UPPER_LIMIT = 10000; - /** - * Number of Update Delta files which is the Threshold for IUD compaction. - * Only accepted Range is 0 - 10000. Outside this range system will pick default value. - */ - @CarbonProperty - public static final String UPDATE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION = Review comment: please look up `carbon.horizontal.update.compaction.threshold` and `UPDATE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION` in code. still many places it is present ########## File path: docs/configuration-parameters.md ########## @@ -104,8 +104,7 @@ This section provides the details of all the configurations required for the Car | carbon.number.of.cores.while.compacting | 2 | Number of cores to be used while compacting data. This also determines the number of threads to be used to read carbondata files in parallel. | | carbon.compaction.level.threshold | 4, 3 | Each CarbonData load will create one segment, if every load is small in size it will generate many small file over a period of time impacting the query performance. This configuration is for minor compaction which decides how many segments to be merged. Configuration is of the form (x,y). Compaction will be triggered for every x segments and form a single level 1 compacted segment. When the number of compacted level 1 segments reach y, compaction will be triggered again to merge them to form a single level 2 segment. For example: If it is set as 2, 3 then minor compaction will be triggered for every 2 segments. 3 is the number of level 1 compacted segments which is further compacted to new segment. **NOTE:** When ***carbon.enable.auto.load.merge*** is **true**, configuring higher values cause overall data loading time to increase as compaction will be triggered after data loading is complete but status is not returned till compaction is c omplete. But compacting more number of segments can increase query performance. Hence optimal values needs to be configured based on the business scenario. Valid values are between 0 to 100. | | carbon.major.compaction.size | 1024 | To improve query performance and all the segments can be merged and compacted to a single segment upto configured size. This Major compaction size can be configured using this parameter. Sum of the segments which is below this threshold will be merged. This value is expressed in MB. | -| carbon.horizontal.compaction.enable | true | CarbonData supports DELETE/UPDATE functionality by creating delta data files for existing carbondata files. These delta files would grow as more number of DELETE/UPDATE operations are performed. Compaction of these delta files are termed as horizontal compaction. This configuration is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delta (DELETE/ UPDATE) files becomes more than specified threshold. **NOTE:** Having many delta files will reduce the query performance as scan has to happen on all these files before the final state of data can be decided. Hence it is advisable to keep horizontal compaction enabled and configure reasonable values to ***carbon.horizontal.UPDATE.compaction.threshold*** and ***carbon.horizontal.DELETE.compaction.threshold*** | -| carbon.horizontal.update.compaction.threshold | 1 | This configuration specifies the threshold limit on number of UPDATE delta files within a segment. In case the number of delta files goes beyond the threshold, the UPDATE delta files within the segment becomes eligible for horizontal compaction and are compacted into single UPDATE delta file. Values range between 1 to 10000. | +| carbon.horizontal.compaction.enable | true | CarbonData supports DELETE/UPDATE functionality by creating delta data files for existing carbondata files. These delta files would grow as more number of DELETE/UPDATE operations are performed. Compaction of these delete delta files are termed as horizontal compaction. This configuration is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delete delta files becomes more than specified threshold. **NOTE:** Having many delta files will reduce the query performance as scan has to happen on all these files before the final state of data can be decided. Hence it is advisable to keep horizontal compaction enabled and configure reasonable values to ***carbon.horizontal.UPDATE.compaction.threshold*** and ***carbon.horizontal.DELETE.compaction.threshold*** | Review comment: Need to remove `carbon.horizontal.UPDATE.compaction.threshold` ? ########## File path: processing/src/main/java/org/apache/carbondata/processing/merger/CompactionType.java ########## @@ -26,7 +26,6 @@ public enum CompactionType { MINOR, MAJOR, - IUD_UPDDEL_DELTA, Review comment: Don't remove ENUM, it may give compatibility issues. ########## File path: core/src/main/java/org/apache/carbondata/core/segmentmeta/SegmentMetaDataInfoStats.java ########## @@ -86,28 +85,14 @@ public synchronized SegmentMetaDataInfo getTableSegmentMetaDataInfo(String table public synchronized void setBlockMetaDataInfo(String tableName, String segmentId, BlockColumnMetaDataInfo currentBlockColumnMetaInfo) { // check if tableName is present in tableSegmentMetaDataInfoMap - if (!this.tableSegmentMetaDataInfoMap.isEmpty() && null != this.tableSegmentMetaDataInfoMap - .get(tableName) && null != this.tableSegmentMetaDataInfoMap.get(tableName).get(segmentId)) { - // get previous blockColumn metadata information - BlockColumnMetaDataInfo previousBlockColumnMetaInfo = - this.tableSegmentMetaDataInfoMap.get(tableName).get(segmentId); - // compare and get updated min and max values - byte[][] updatedMin = BlockIndex.compareAndUpdateMinMax(previousBlockColumnMetaInfo.getMin(), Review comment: ok ########## File path: core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java ########## @@ -373,9 +374,7 @@ public static void cleanStaleDeltaFiles(CarbonTable table, final String timeStam @Override public boolean accept(CarbonFile file) { String fileName = file.getName(); - return (fileName.endsWith(timeStamp + CarbonCommonConstants.UPDATE_DELTA_FILE_EXT) Review comment: Also can remove `UPDATE_DELTA_FILE_EXT` and `UPDATE_INDEX_FILE_EXT` from code ########## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala ########## @@ -158,8 +159,6 @@ private[sql] case class CarbonProjectForUpdateCommand( "for the update key") } } - // handle the clean up of IUD. - CarbonUpdateUtil.cleanUpDeltaFiles(carbonTable, false) Review comment: @QiangCai , @marchpure : In the release notes or upgrade guide, we can suggest to the user that if update files are present, compact (horizontal) before upgrading. Just my opinion. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org