ajantha-bhat commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r515759986
##########
File path:
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -808,19 +808,6 @@ private CarbonCommonConstants() {
*/
public static final int NUMBER_OF_SEGMENT_COMPACTED_PERTIME_UPPER_LIMIT =
10000;
- /**
- * Number of Update Delta files which is the Threshold for IUD compaction.
- * Only accepted Range is 0 - 10000. Outside this range system will pick
default value.
- */
- @CarbonProperty
- public static final String UPDATE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION =
Review comment:
please look up `carbon.horizontal.update.compaction.threshold` and
`UPDATE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION` in code. still many places it
is present
##########
File path: docs/configuration-parameters.md
##########
@@ -104,8 +104,7 @@ This section provides the details of all the configurations
required for the Car
| carbon.number.of.cores.while.compacting | 2 | Number of cores to be used
while compacting data. This also determines the number of threads to be used to
read carbondata files in parallel. |
| carbon.compaction.level.threshold | 4, 3 | Each CarbonData load will create
one segment, if every load is small in size it will generate many small file
over a period of time impacting the query performance. This configuration is
for minor compaction which decides how many segments to be merged.
Configuration is of the form (x,y). Compaction will be triggered for every x
segments and form a single level 1 compacted segment. When the number of
compacted level 1 segments reach y, compaction will be triggered again to merge
them to form a single level 2 segment. For example: If it is set as 2, 3 then
minor compaction will be triggered for every 2 segments. 3 is the number of
level 1 compacted segments which is further compacted to new segment. **NOTE:**
When ***carbon.enable.auto.load.merge*** is **true**, configuring higher values
cause overall data loading time to increase as compaction will be triggered
after data loading is complete but status is not returned till compaction is c
omplete. But compacting more number of segments can increase query
performance. Hence optimal values needs to be configured based on the business
scenario. Valid values are between 0 to 100. |
| carbon.major.compaction.size | 1024 | To improve query performance and all
the segments can be merged and compacted to a single segment upto configured
size. This Major compaction size can be configured using this parameter. Sum of
the segments which is below this threshold will be merged. This value is
expressed in MB. |
-| carbon.horizontal.compaction.enable | true | CarbonData supports
DELETE/UPDATE functionality by creating delta data files for existing
carbondata files. These delta files would grow as more number of DELETE/UPDATE
operations are performed. Compaction of these delta files are termed as
horizontal compaction. This configuration is used to turn ON/OFF horizontal
compaction. After every DELETE and UPDATE statement, horizontal compaction may
occur in case the delta (DELETE/ UPDATE) files becomes more than specified
threshold. **NOTE:** Having many delta files will reduce the query performance
as scan has to happen on all these files before the final state of data can be
decided. Hence it is advisable to keep horizontal compaction enabled and
configure reasonable values to
***carbon.horizontal.UPDATE.compaction.threshold*** and
***carbon.horizontal.DELETE.compaction.threshold*** |
-| carbon.horizontal.update.compaction.threshold | 1 | This configuration
specifies the threshold limit on number of UPDATE delta files within a segment.
In case the number of delta files goes beyond the threshold, the UPDATE delta
files within the segment becomes eligible for horizontal compaction and are
compacted into single UPDATE delta file. Values range between 1 to 10000. |
+| carbon.horizontal.compaction.enable | true | CarbonData supports
DELETE/UPDATE functionality by creating delta data files for existing
carbondata files. These delta files would grow as more number of DELETE/UPDATE
operations are performed. Compaction of these delete delta files are termed as
horizontal compaction. This configuration is used to turn ON/OFF horizontal
compaction. After every DELETE and UPDATE statement, horizontal compaction may
occur in case the delete delta files becomes more than specified threshold.
**NOTE:** Having many delta files will reduce the query performance as scan has
to happen on all these files before the final state of data can be decided.
Hence it is advisable to keep horizontal compaction enabled and configure
reasonable values to ***carbon.horizontal.UPDATE.compaction.threshold*** and
***carbon.horizontal.DELETE.compaction.threshold*** |
Review comment:
Need to remove `carbon.horizontal.UPDATE.compaction.threshold` ?
##########
File path:
processing/src/main/java/org/apache/carbondata/processing/merger/CompactionType.java
##########
@@ -26,7 +26,6 @@
public enum CompactionType {
MINOR,
MAJOR,
- IUD_UPDDEL_DELTA,
Review comment:
Don't remove ENUM, it may give compatibility issues.
##########
File path:
core/src/main/java/org/apache/carbondata/core/segmentmeta/SegmentMetaDataInfoStats.java
##########
@@ -86,28 +85,14 @@ public synchronized SegmentMetaDataInfo
getTableSegmentMetaDataInfo(String table
public synchronized void setBlockMetaDataInfo(String tableName, String
segmentId,
BlockColumnMetaDataInfo currentBlockColumnMetaInfo) {
// check if tableName is present in tableSegmentMetaDataInfoMap
- if (!this.tableSegmentMetaDataInfoMap.isEmpty() && null !=
this.tableSegmentMetaDataInfoMap
- .get(tableName) && null !=
this.tableSegmentMetaDataInfoMap.get(tableName).get(segmentId)) {
- // get previous blockColumn metadata information
- BlockColumnMetaDataInfo previousBlockColumnMetaInfo =
- this.tableSegmentMetaDataInfoMap.get(tableName).get(segmentId);
- // compare and get updated min and max values
- byte[][] updatedMin =
BlockIndex.compareAndUpdateMinMax(previousBlockColumnMetaInfo.getMin(),
Review comment:
ok
##########
File path:
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -373,9 +374,7 @@ public static void cleanStaleDeltaFiles(CarbonTable table,
final String timeStam
@Override
public boolean accept(CarbonFile file) {
String fileName = file.getName();
- return (fileName.endsWith(timeStamp +
CarbonCommonConstants.UPDATE_DELTA_FILE_EXT)
Review comment:
Also can remove `UPDATE_DELTA_FILE_EXT` and `UPDATE_INDEX_FILE_EXT` from
code
##########
File path:
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -158,8 +159,6 @@ private[sql] case class CarbonProjectForUpdateCommand(
"for the update key")
}
}
- // handle the clean up of IUD.
- CarbonUpdateUtil.cleanUpDeltaFiles(carbonTable, false)
Review comment:
@QiangCai , @marchpure : In the release notes or upgrade guide, we can
suggest to the user that if update files are present, compact (horizontal)
before upgrading. Just my opinion.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]