Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2864#discussion_r228703110
  
    --- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
    @@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory {
           .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext)
         val skewedDataOptimization = CarbonProperties.getInstance()
           .isLoadSkewedDataOptimizationEnabled()
    -    val loadMinSizeOptimization = CarbonProperties.getInstance()
    -      .isLoadMinSizeOptimizationEnabled()
         // get user ddl input the node loads the smallest amount of data
    -    val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
    +    val carbonTable = 
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
    +    val loadMinSize = 
carbonTable.getTableInfo.getFactTable.getTableProperties.asScala
    +      .getOrElse(CarbonCommonConstants.CARBON_LOAD_MIN_SIZE_INMB, "")
    +    var expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
    --- End diff --
    
    there is no need to add another variable `expectedMinSizePerNode`. In line 
1190, we can just use `loadMinSize` to determine which branch should we go: if 
it is zero, use 'BLOCK_SIZE_FIRST', otherwise, use 'NODE_MIN_SIZE_FIRST'.


---

Reply via email to