[GitHub] carbondata pull request #2632: [CARBONDATA-2206] Enhanced document on Lucene...

sraghunandan Thu, 16 Aug 2018 20:47:52 -0700

Github user sraghunandan commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2632#discussion_r210797397
  
    --- Diff: docs/datamap/lucene-datamap-guide.md ---
    @@ -70,42 +66,38 @@ It will show all DataMaps created on main table.
       USING 'lucene'
       DMPROPERTIES ('INDEX_COLUMNS' = 'name, country',)
       ```
    -
    -**DMProperties**
    -1. INDEX_COLUMNS: The list of string columns on which lucene creates 
indexes.
    -2. FLUSH_CACHE: size of the cache to maintain in Lucene writer, if 
specified then it tries to 
    -   aggregate the unique data till the cache limit and flush to Lucene. It 
is best suitable for low 
    -   cardinality dimensions.
    -3. SPLIT_BLOCKLET: when made as true then store the data in blocklet wise 
in lucene , it means new 
    -   folder will be created for each blocklet, thus, it eliminates storing 
blockletid in lucene and 
    -   also it makes lucene small chunks of data.
    +**Properties for Lucene DataMap**
    +
    +| Property | Is Required | Default Value | Description |
    +|-------------|----------|--------|---------|
    +| INDEX_COLUMNS | YES |  | Carbondata will generate Lucene index on these 
string columns. |
    +| FLUSH_CACHE | NO | -1 | It defines the size of the cache to maintain in 
Lucene writer. If specified, it tries to aggregate the unique data till the 
cache limit and then flushes to Lucene. It is recommended to define FLUSH_CACHE 
for low cardinality dimensions.|
    --- End diff --
    
    explanation is not clear.why it is recommended for low cardinality columns?

---

[GitHub] carbondata pull request #2632: [CARBONDATA-2206] Enhanced document on Lucene...

Reply via email to