[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

GitBox Tue, 03 Nov 2020 06:03:48 -0800


ajantha-bhat commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r515759986




##########
File path: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -808,19 +808,6 @@ private CarbonCommonConstants() {
    */
   public static final int NUMBER_OF_SEGMENT_COMPACTED_PERTIME_UPPER_LIMIT = 
10000;
 
-  /**
-   * Number of Update Delta files which is the Threshold for IUD compaction.
-   * Only accepted Range is 0 - 10000. Outside this range system will pick 
default value.
-   */
-  @CarbonProperty
-  public static final String UPDATE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION =

Review comment:
       please look up `carbon.horizontal.update.compaction.threshold` and 
`UPDATE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION` in code. still many places it 
is present

##########
File path: docs/configuration-parameters.md
##########
@@ -104,8 +104,7 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.number.of.cores.while.compacting | 2 | Number of cores to be used 
while compacting data. This also determines the number of threads to be used to 
read carbondata files in parallel. |
 | carbon.compaction.level.threshold | 4, 3 | Each CarbonData load will create 
one segment, if every load is small in size it will generate many small file 
over a period of time impacting the query performance. This configuration is 
for minor compaction which decides how many segments to be merged. 
Configuration is of the form (x,y). Compaction will be triggered for every x 
segments and form a single level 1 compacted segment. When the number of 
compacted level 1 segments reach y, compaction will be triggered again to merge 
them to form a single level 2 segment. For example: If it is set as 2, 3 then 
minor compaction will be triggered for every 2 segments. 3 is the number of 
level 1 compacted segments which is further compacted to new segment. **NOTE:** 
When ***carbon.enable.auto.load.merge*** is **true**, configuring higher values 
cause overall data loading time to increase as compaction will be triggered 
after data loading is complete but status is not returned till compaction is c
 omplete. But compacting more number of segments can increase query 
performance. Hence optimal values needs to be configured based on the business 
scenario. Valid values are between 0 to 100. |
 | carbon.major.compaction.size | 1024 | To improve query performance and all 
the segments can be merged and compacted to a single segment upto configured 
size. This Major compaction size can be configured using this parameter. Sum of 
the segments which is below this threshold will be merged. This value is 
expressed in MB. |
-| carbon.horizontal.compaction.enable | true | CarbonData supports 
DELETE/UPDATE functionality by creating delta data files for existing 
carbondata files. These delta files would grow as more number of DELETE/UPDATE 
operations are performed. Compaction of these delta files are termed as 
horizontal compaction. This configuration is used to turn ON/OFF horizontal 
compaction. After every DELETE and UPDATE statement, horizontal compaction may 
occur in case the delta (DELETE/ UPDATE) files becomes more than specified 
threshold. **NOTE:** Having many delta files will reduce the query performance 
as scan has to happen on all these files before the final state of data can be 
decided. Hence it is advisable to keep horizontal compaction enabled and 
configure reasonable values to 
***carbon.horizontal.UPDATE.compaction.threshold*** and 
***carbon.horizontal.DELETE.compaction.threshold*** |
-| carbon.horizontal.update.compaction.threshold | 1 | This configuration 
specifies the threshold limit on number of UPDATE delta files within a segment. 
In case the number of delta files goes beyond the threshold, the UPDATE delta 
files within the segment becomes eligible for horizontal compaction and are 
compacted into single UPDATE delta file. Values range between 1 to 10000. |
+| carbon.horizontal.compaction.enable | true | CarbonData supports 
DELETE/UPDATE functionality by creating delta data files for existing 
carbondata files. These delta files would grow as more number of DELETE/UPDATE 
operations are performed. Compaction of these delete delta files are termed as 
horizontal compaction. This configuration is used to turn ON/OFF horizontal 
compaction. After every DELETE and UPDATE statement, horizontal compaction may 
occur in case the delete delta files becomes more than specified threshold. 
**NOTE:** Having many delta files will reduce the query performance as scan has 
to happen on all these files before the final state of data can be decided. 
Hence it is advisable to keep horizontal compaction enabled and configure 
reasonable values to ***carbon.horizontal.UPDATE.compaction.threshold*** and 
***carbon.horizontal.DELETE.compaction.threshold*** |

Review comment:
       Need to remove `carbon.horizontal.UPDATE.compaction.threshold` ?

##########
File path: 
processing/src/main/java/org/apache/carbondata/processing/merger/CompactionType.java
##########
@@ -26,7 +26,6 @@
 public enum CompactionType {
     MINOR,
     MAJOR,
-    IUD_UPDDEL_DELTA,

Review comment:
       Don't remove ENUM, it may give compatibility issues. 

##########
File path: 
core/src/main/java/org/apache/carbondata/core/segmentmeta/SegmentMetaDataInfoStats.java
##########
@@ -86,28 +85,14 @@ public synchronized SegmentMetaDataInfo 
getTableSegmentMetaDataInfo(String table
   public synchronized void setBlockMetaDataInfo(String tableName, String 
segmentId,
       BlockColumnMetaDataInfo currentBlockColumnMetaInfo) {
     // check if tableName is present in tableSegmentMetaDataInfoMap
-    if (!this.tableSegmentMetaDataInfoMap.isEmpty() && null != 
this.tableSegmentMetaDataInfoMap
-        .get(tableName) && null != 
this.tableSegmentMetaDataInfoMap.get(tableName).get(segmentId)) {
-      // get previous blockColumn metadata information
-      BlockColumnMetaDataInfo previousBlockColumnMetaInfo =
-          this.tableSegmentMetaDataInfoMap.get(tableName).get(segmentId);
-      // compare and get updated min and max values
-      byte[][] updatedMin = 
BlockIndex.compareAndUpdateMinMax(previousBlockColumnMetaInfo.getMin(),

Review comment:
       ok

##########
File path: 
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -373,9 +374,7 @@ public static void cleanStaleDeltaFiles(CarbonTable table, 
final String timeStam
         @Override
         public boolean accept(CarbonFile file) {
           String fileName = file.getName();
-          return (fileName.endsWith(timeStamp + 
CarbonCommonConstants.UPDATE_DELTA_FILE_EXT)

Review comment:
       Also can remove `UPDATE_DELTA_FILE_EXT` and `UPDATE_INDEX_FILE_EXT` from 
code

##########
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -158,8 +159,6 @@ private[sql] case class CarbonProjectForUpdateCommand(
                 "for the update key")
             }
           }
-          // handle the clean up of IUD.
-          CarbonUpdateUtil.cleanUpDeltaFiles(carbonTable, false)

Review comment:
       @QiangCai , @marchpure : In the release notes or upgrade guide, we can 
suggest to the user that if update files are present, compact (horizontal) 
before upgrading. Just my opinion. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3999: [CARBONDATA-4044] Fix dirty data in indexfile while IUD with stale data in segment folder

Reply via email to