[GitHub] carbondata pull request #2632: [CARBONDATA-2206] Enhanced document on Lucene...

xuchuanyin Sun, 09 Sep 2018 05:34:24 -0700

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2632#discussion_r216156187
  
    --- Diff: docs/datamap/lucene-datamap-guide.md ---
    @@ -70,42 +66,38 @@ It will show all DataMaps created on main table.
       USING 'lucene'
       DMPROPERTIES ('INDEX_COLUMNS' = 'name, country',)
       ```
    -
    -**DMProperties**
    -1. INDEX_COLUMNS: The list of string columns on which lucene creates 
indexes.
    -2. FLUSH_CACHE: size of the cache to maintain in Lucene writer, if 
specified then it tries to 
    -   aggregate the unique data till the cache limit and flush to Lucene. It 
is best suitable for low 
    -   cardinality dimensions.
    -3. SPLIT_BLOCKLET: when made as true then store the data in blocklet wise 
in lucene , it means new 
    -   folder will be created for each blocklet, thus, it eliminates storing 
blockletid in lucene and 
    -   also it makes lucene small chunks of data.
    +**Properties for Lucene DataMap**
    +
    +| Property | Is Required | Default Value | Description |
    +|-------------|----------|--------|---------|
    +| INDEX_COLUMNS | YES |  | Carbondata will generate Lucene index on these 
string columns. |
    +| FLUSH_CACHE | NO | -1 | It defines the size of the cache to maintain in 
Lucene writer. If specified, it tries to aggregate the unique data till the 
cache limit and then flushes to Lucene. It is recommended to define FLUSH_CACHE 
for low cardinality dimensions.|
    +| SPLIT_BLOCKLET | NO | TRUE | When SPLIT_BLOCKLET is defined as "TRUE", 
folders are created per blocklet by using the blockletID. This eliminates 
indexing blockletID by lucene by storing only pageID and rowID, thus reducing 
the size of indexes created by lucene. |
    +
    +**Folder Structure for lucene datamap:**
    +  * Location of index files when Split BlockletId is TRUE: 
    +    
    +    tablePath/dataMapName/SegmentID/blockName/blockletID/..
    +
    +  * Location of index files when Split BlockletId is FALSE:
    +    
    +    tablePath/dataMapName/SegmentID/blockName/..
        
     ## Loading data
    -When loading data to main table, lucene index files will be generated for 
all the
    -index_columns(String Columns) given in DMProperties which contains 
information about the data
    -location of index_columns. These index files will be written inside a 
folder named with datamap name
    -inside each segment folders.
    +When loading data to main table, lucene index files will be generated for 
all the index_columns(String Columns) given in DMProperties which contains 
information about the data location of index_columns. These index files will be 
written into the path mentioned above.
    --- End diff --
    
    for all the index_columns(String Columns)
    ---
    I think there is no need to mention 'String Columns' again since it is 
already mentioned in DMProperties

---

[GitHub] carbondata pull request #2632: [CARBONDATA-2206] Enhanced document on Lucene...

Reply via email to