marchpure commented on a change in pull request #3999:
URL: https://github.com/apache/carbondata/pull/3999#discussion_r515819017
##########
File path:
processing/src/main/java/org/apache/carbondata/processing/merger/CompactionType.java
##########
@@ -26,7 +26,6 @@
public enum CompactionType {
MINOR,
MAJOR,
- IUD_UPDDEL_DELTA,
Review comment:
I have modified code according to your suggestion
##########
File path: docs/configuration-parameters.md
##########
@@ -104,8 +104,7 @@ This section provides the details of all the configurations
required for the Car
| carbon.number.of.cores.while.compacting | 2 | Number of cores to be used
while compacting data. This also determines the number of threads to be used to
read carbondata files in parallel. |
| carbon.compaction.level.threshold | 4, 3 | Each CarbonData load will create
one segment, if every load is small in size it will generate many small file
over a period of time impacting the query performance. This configuration is
for minor compaction which decides how many segments to be merged.
Configuration is of the form (x,y). Compaction will be triggered for every x
segments and form a single level 1 compacted segment. When the number of
compacted level 1 segments reach y, compaction will be triggered again to merge
them to form a single level 2 segment. For example: If it is set as 2, 3 then
minor compaction will be triggered for every 2 segments. 3 is the number of
level 1 compacted segments which is further compacted to new segment. **NOTE:**
When ***carbon.enable.auto.load.merge*** is **true**, configuring higher values
cause overall data loading time to increase as compaction will be triggered
after data loading is complete but status is not returned till compaction is c
omplete. But compacting more number of segments can increase query
performance. Hence optimal values needs to be configured based on the business
scenario. Valid values are between 0 to 100. |
| carbon.major.compaction.size | 1024 | To improve query performance and all
the segments can be merged and compacted to a single segment upto configured
size. This Major compaction size can be configured using this parameter. Sum of
the segments which is below this threshold will be merged. This value is
expressed in MB. |
-| carbon.horizontal.compaction.enable | true | CarbonData supports
DELETE/UPDATE functionality by creating delta data files for existing
carbondata files. These delta files would grow as more number of DELETE/UPDATE
operations are performed. Compaction of these delta files are termed as
horizontal compaction. This configuration is used to turn ON/OFF horizontal
compaction. After every DELETE and UPDATE statement, horizontal compaction may
occur in case the delta (DELETE/ UPDATE) files becomes more than specified
threshold. **NOTE:** Having many delta files will reduce the query performance
as scan has to happen on all these files before the final state of data can be
decided. Hence it is advisable to keep horizontal compaction enabled and
configure reasonable values to
***carbon.horizontal.UPDATE.compaction.threshold*** and
***carbon.horizontal.DELETE.compaction.threshold*** |
-| carbon.horizontal.update.compaction.threshold | 1 | This configuration
specifies the threshold limit on number of UPDATE delta files within a segment.
In case the number of delta files goes beyond the threshold, the UPDATE delta
files within the segment becomes eligible for horizontal compaction and are
compacted into single UPDATE delta file. Values range between 1 to 10000. |
+| carbon.horizontal.compaction.enable | true | CarbonData supports
DELETE/UPDATE functionality by creating delta data files for existing
carbondata files. These delta files would grow as more number of DELETE/UPDATE
operations are performed. Compaction of these delete delta files are termed as
horizontal compaction. This configuration is used to turn ON/OFF horizontal
compaction. After every DELETE and UPDATE statement, horizontal compaction may
occur in case the delete delta files becomes more than specified threshold.
**NOTE:** Having many delta files will reduce the query performance as scan has
to happen on all these files before the final state of data can be decided.
Hence it is advisable to keep horizontal compaction enabled and configure
reasonable values to ***carbon.horizontal.UPDATE.compaction.threshold*** and
***carbon.horizontal.DELETE.compaction.threshold*** |
Review comment:
yes. it won't be unused anywhere
##########
File path:
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -373,9 +374,7 @@ public static void cleanStaleDeltaFiles(CarbonTable table,
final String timeStam
@Override
public boolean accept(CarbonFile file) {
String fileName = file.getName();
- return (fileName.endsWith(timeStamp +
CarbonCommonConstants.UPDATE_DELTA_FILE_EXT)
Review comment:
I have modified code according to your suggestion
##########
File path:
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
##########
@@ -808,19 +808,6 @@ private CarbonCommonConstants() {
*/
public static final int NUMBER_OF_SEGMENT_COMPACTED_PERTIME_UPPER_LIMIT =
10000;
- /**
- * Number of Update Delta files which is the Threshold for IUD compaction.
- * Only accepted Range is 0 - 10000. Outside this range system will pick
default value.
- */
- @CarbonProperty
- public static final String UPDATE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION =
Review comment:
I have modified code according to your suggestion
##########
File path:
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -268,6 +267,11 @@ private[sql] case class CarbonProjectForUpdateCommand(
// When the table has too many segemnts, it will take a long time.
Review comment:
I have modified code according to your suggestion
##########
File path:
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -268,6 +267,11 @@ private[sql] case class CarbonProjectForUpdateCommand(
// When the table has too many segemnts, it will take a long time.
// So moving it to the end and it is outside of locking.
CarbonUpdateUtil.cleanStaleDeltaFiles(carbonTable, fileTimestamp)
Review comment:
I will raise another PR and give an optimazation in this week
##########
File path:
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -158,8 +159,6 @@ private[sql] case class CarbonProjectForUpdateCommand(
"for the update key")
}
}
- // handle the clean up of IUD.
- CarbonUpdateUtil.cleanUpDeltaFiles(carbonTable, false)
Review comment:
Clean Files Command maybe a better solution.
##########
File path:
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -1050,14 +752,15 @@ object CarbonDataRDDFactory {
}
var done = true
// If the updated data should be added as new segment then update the
segment information
- if (updateModel.isDefined && updateModel.get.loadAsNewSegment) {
+ if (updateModel.isDefined) {
Review comment:
I will raise another PR and give an optimazation in this week
##########
File path:
processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java
##########
@@ -1243,10 +1256,14 @@ public static void
updateTableStatusInCaseOfFailure(String loadName,
SegmentStatusManager.readLoadMetadata(metaDataPath);
boolean ifTableStatusUpdateRequired = false;
for (LoadMetadataDetails loadMetadataDetail : loadMetadataDetails) {
- if (loadMetadataDetail.getSegmentStatus() ==
SegmentStatus.INSERT_IN_PROGRESS && loadName
- .equalsIgnoreCase(loadMetadataDetail.getLoadName())) {
+ if ((loadMetadataDetail.getSegmentStatus() ==
SegmentStatus.INSERT_IN_PROGRESS
+ || loadMetadataDetail.getSegmentStatus() == SegmentStatus.SUCCESS
Review comment:
I have modified code according to your suggestion
##########
File path:
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/mutation/CarbonProjectForUpdateCommand.scala
##########
@@ -268,6 +267,11 @@ private[sql] case class CarbonProjectForUpdateCommand(
// When the table has too many segemnts, it will take a long time.
// So moving it to the end and it is outside of locking.
CarbonUpdateUtil.cleanStaleDeltaFiles(carbonTable, fileTimestamp)
+
+ // Delete The New Inserted Segment
+ CarbonLoaderUtil.updateTableStatusInCaseOfFailure(
Review comment:
I have modified code according to your suggestion
##########
File path:
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
##########
@@ -342,33 +336,8 @@ object CarbonDataRDDFactory {
try {
if (!carbonLoadModel.isCarbonTransactionalTable ||
segmentLock.lockWithRetries()) {
- if (updateModel.isDefined && !updateModel.get.loadAsNewSegment) {
- res = loadDataFrameForUpdate(
- sqlContext,
- dataFrame,
- carbonLoadModel,
- updateModel,
- carbonTable,
- hadoopConf,
- segmentMetaDataAccumulator)
- res.foreach { resultOfSeg =>
- resultOfSeg.foreach { resultOfBlock =>
- if (resultOfBlock._2._1.getSegmentStatus ==
SegmentStatus.LOAD_FAILURE) {
- loadStatus = SegmentStatus.LOAD_FAILURE
- if (resultOfBlock._2._2.failureCauses == FailureCauses.NONE) {
- updateModel.get.executorErrors.failureCauses =
FailureCauses.EXECUTOR_FAILURE
- updateModel.get.executorErrors.errorMsg = "Failure in the
Executor."
- } else {
- updateModel.get.executorErrors = resultOfBlock._2._2
- }
- } else if (resultOfBlock._2._1.getSegmentStatus ==
- SegmentStatus.LOAD_PARTIAL_SUCCESS) {
- loadStatus = SegmentStatus.LOAD_PARTIAL_SUCCESS
- updateModel.get.executorErrors.failureCauses =
resultOfBlock._2._2.failureCauses
- updateModel.get.executorErrors.errorMsg =
resultOfBlock._2._2.errorMsg
- }
- }
- }
+ if (updateModel.isDefined && dataFrame.get.rdd.isEmpty()) {
+ // if the rowtoupdated is empty, do nothing
Review comment:
I have modified code according to your suggestion
##########
File path:
core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
##########
@@ -276,14 +278,13 @@ public static boolean
updateTableMetadataStatus(Set<Segment> updatedSegmentsList
SegmentStatusManager.readLoadMetadata(metaDataFilepath);
for (LoadMetadataDetails loadMetadata : listOfLoadFolderDetailsArray) {
+ if (isUpdateStatusFileUpdateRequired &&
+ loadMetadata.getLoadName().equalsIgnoreCase("0")) {
+ loadMetadata.setUpdateStatusFileName(
+ CarbonUpdateUtil.getUpdateStatusFileName(updatedTimeStamp));
+ }
if (isTimestampUpdateRequired) {
- // we are storing the link between the 2 status files in the
segment 0 only.
Review comment:
I have modified code according to your suggestion
##########
File path:
core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/SegmentIndexFileStore.java
##########
@@ -101,8 +101,8 @@ public SegmentIndexFileStore(Configuration configuration) {
* @param segmentPath
* @throws IOException
*/
- public void readAllIIndexOfSegment(String segmentPath) throws IOException {
- CarbonFile[] carbonIndexFiles = getCarbonIndexFiles(segmentPath,
configuration);
+ public void readAllIIndexOfSegment(String segmentPath, String uuid) throws
IOException {
Review comment:
I have modified code according to your suggestion
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]