[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

xuchuanyin Fri, 26 Oct 2018 23:51:02 -0700

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2864#discussion_r228703510
  
    --- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
    @@ -1171,12 +1171,27 @@ object CarbonDataRDDFactory {
           .ensureExecutorsAndGetNodeList(blockList, sqlContext.sparkContext)
         val skewedDataOptimization = CarbonProperties.getInstance()
           .isLoadSkewedDataOptimizationEnabled()
    -    val loadMinSizeOptimization = CarbonProperties.getInstance()
    -      .isLoadMinSizeOptimizationEnabled()
         // get user ddl input the node loads the smallest amount of data
    -    val expectedMinSizePerNode = carbonLoadModel.getLoadMinSize()
    +    val carbonTable = 
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable
    +    val loadMinSize = 
carbonTable.getTableInfo.getFactTable.getTableProperties.asScala
    --- End diff --
    
    It seems that you get the load-min-size only from the table property but 
you claimed that carbon also support specifying it through loadOption.
    
    The expected procedure is:
    1. get the loadMinSize from LoadOption, if it is zero, goto step2; 
otherwise goto step4ï¼ 
    2. get it from TableProperty, if it is zero, go to step 3, otherwise goto 
step4;
    3. use other strategy
    4. use NODE_MIN_SIZE_FIRST;
    
    Have you handled this?

---

[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

Reply via email to