[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202851#comment-15202851
 ] 

DOAN DuyHai commented on CASSANDRA-11383:
-----------------------------------------

[~jkrupan]

 Other than terminology and wording/documentation about {{SPARSE}} mode, what 
interests me more is how SASI can deal with {{DENSE}} index e.g. few indexed 
value for millions/billions of matching primary keys.

 The original secondary index was not adapted for 

1. very low cardinality (index on email to search for user for example) because 
it does not scale well with cluster size. In worst case you'll need to scan 
N/RF nodes to fetch 0 or at most 1 user so the ratio effort vs result is bad

2. very high cardinality (user gender for example) because for each distinct 
indexed value, you can have many matching users and it creates ultra wide-rows, 
an anti-pattern

 With SASI, although point 1. still holds (that's the common issue with all 
**distributed** index systems, even Solr or ES) I had hoped that limitation 2. 
will be lifted since SASI stores data in its own structures

> SASI index build leads to massive OOM
> -------------------------------------
>
>                 Key: CASSANDRA-11383
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>         Environment: C* 3.4
>            Reporter: DOAN DuyHai
>         Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, 
> system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to