[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

xuchuanyin Fri, 26 Oct 2018 23:51:02 -0700

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2864#discussion_r228703382
  
    --- Diff: docs/ddl-of-carbondata.md ---
    @@ -474,7 +475,22 @@ CarbonData DDL statements are documented here,which 
includes:
          be later viewed in table description for reference.
     
          ```
    -       TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'')
    +       TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords')
    +     ```
    +     
    +   - ##### Load minimum data size
    +     This property determines whether to enable node minumun input data 
size allocation strategy 
    --- End diff --
    
    You can optimize this description like this:
    
    ```
    This property indicates the minimum input data size per node for data 
loading.
    By default it is not enabled. Setting a non-zero integer value will enable 
this feature.
    This property is useful if you have a large cluster and only want a small 
portion of the nodes to process data loading.
    For example, if you have a cluster with 10 nodes and the input data is 
about 1GB. Without this property, each node will process about 100MB input data 
and result in at least 10 data files. With this property configured with 512 
will, only 2 nodes will be chosen to process the input data, each with about 
512MB input and result in about 2 or 4 files based on the compress ratio.
    Moreover, this property can also be specified in the load option.
    Notice that once you enable this feature, for load balance, carbondata will 
ignore the data locality while assigning input data to nodes, this will cause 
more network traffic.
    ```

---

[GitHub] carbondata pull request #2864: [CARBONDATA-3041] Optimize load minimum size ...

Reply via email to