[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM

Jack Krupansky (JIRA) Sat, 19 Mar 2016 10:42:54 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15202876#comment-15202876
 ]


Jack Krupansky commented on CASSANDRA-11383:
--------------------------------------------

The int field could easily be made a text field if that would make SASI work 
better (you can even do prefix query by year then.)

Point 1 is precisely what SASI SPARSE is designed for. It also is what 
Materialized Views (formerly Global Indexes) is for and MV is even better for 
since it eliminates the need to scan multiple nodes since the rows get 
collected based on the new partition key that can include the indexed data 
value.

You're using cardinality backwards - it is supposed to be a measure of the 
number of distinct values in a column, not the number of rows containing each 
value. See: https://en.wikipedia.org/wiki/Cardinality_%28SQL_statements%29. 
Granted, in ERD cardinality is the count of rows in a second table for each 
column value in a given table (one to n, n to one, etc.), but in the context of 
an index there is only one table involved, although you could consider the 
index to be a table, but that would be a little odd. In any case, best to stick 
with the standard SQL meaning of the cardinality of data values in a column. 
So, to be clear, an email address is high cardinality and gender is low 
cardinality. And the end of month int field is low cardinality or not dense in 
the original SASI doc terminology.

> SASI index build leads to massive OOM
> -------------------------------------
>
>                 Key: CASSANDRA-11383
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>         Environment: C* 3.4
>            Reporter: DOAN DuyHai
>         Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, 
> system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11383) SASI index build leads to massive OOM

Reply via email to