Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2864#discussion_r228703110
--- Diff:
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
---
@@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory {
.ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext)
val skewedDataOptimization = CarbonProperties.getInstance()
.isLoadSkewedDataOptimizationEnabled()
- val loadMinSizeOptimization = CarbonProperties.getInstance()
- .isLoadMinSizeOptimizationEnabled()
// get user ddl input the node loads the smallest amount of data
- val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
+ val carbonTable =
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
+ val loadMinSize =
carbonTable.getTableInfo.getFactTable.getTableProperties.asScala
+ .getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, "")
+ var expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
--- End diff --
there is no need to add another variable `expectedMinSizePerNode`. In line
1190, we can just use `loadMinSize` to determine which branch should we go: if
it is zero, use 'BLOCK_SIZE_FIRST', otherwise, use 'NODE_MIN_SIZE_FIRST'.
---