[GitHub] carbondata pull request #2581: [CARBONDATA-2800][Doc] Add useful tips about ...

xuchuanyin Tue, 31 Jul 2018 02:22:13 -0700

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2581#discussion_r206454747
  
    --- Diff: docs/datamap/bloomfilter-datamap-guide.md ---
    @@ -103,3 +104,24 @@ If the datamap does not prune blocklets well, you can 
try to increase the value
     
     ## Data Management With BloomFilter DataMap
     Data management with BloomFilter datamap has no difference with that on 
Lucene datamap. You can refer to the corresponding section in `CarbonData 
BloomFilter DataMap`.
    +
    +## Useful Tips
    ++ BloomFilter DataMap is suggested to create on the high cardinality 
columns.
    ++ BloomFilter datamap requires that the query conditions on index columns 
are always simple `equal` or `in`,
    + such as 'col1=XX', 'col1 in (XX, YY)'. Otherwise the queries cannot 
benefit from BloomFilter datamap.
    ++ We can create multiple BloomFilter datamaps on one table,
    + also we can create one BloomFilter datamap that contains multiple index 
columns.
    + We do recommend the later behavior since the data loading and query 
performance will be better.
    ++ `BLOOM_FPP` is only the expected number from user, the actually FPP may 
be worse.
    + If the BloomFilter datamap does not work well,
    + you can try to increase `BLOOM_SIZE` and decrease `BLOOM_FPP` at the same 
time.
    + Notice that bigger `BLOOM_SIZE` will increase the size of index file
    + and smaller `BLOOM_FPP` will increase runtime calculation while 
performing query.
    ++ '0' skipped blocklets of BloomFilter datamap in explain output indicates 
that
    + BloomFilter datamap does not prune better than Main datamap.
    --- End diff --
    
    Added an example scenario

---

[GitHub] carbondata pull request #2581: [CARBONDATA-2800][Doc] Add useful tips about ...

Reply via email to