[ 
https://issues.apache.org/jira/browse/CASSANDRA-11383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203172#comment-15203172
 ] 

DOAN DuyHai commented on CASSANDRA-11383:
-----------------------------------------

[~xedin]

Last update from the testing. I put the cluster in *ideal* conditions as you 
recommended.

JVM settings:
 - CMS
 - Xmx8g, Xms8G
 
C* settings:
 - concurrent_compactors: 6

Test conditions:
 - cluster *idle* (no write, no read)
 - LCS with *sstable_size_in_mb* = 1024 (1Gb)
 - *no compaction ongoing* (took a whole night to compact for LCS)
 - {{CREATE CUSTOM INDEX resource_period_end_month_int_idx ON 
sharon.resource_bench (period_end_month_int) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = \{'mode': 
'SPARSE'\};}}

Observations:
 - I/O idle, CPU not exceeding 20% on average 
(http://postimg.org/image/f664wm8dp/)
 - {{nodetool compactionstats}} only show 1 index rebuild ongoing per node  
{noformat}
id                                   compaction type       keyspace table       
   completed  total       unit  progress
d8b4f4b0-ee6a-11e5-81f5-bd5584064785 Secondary index build sharon   
resource_bench 9535985318 18920482745 bytes 50.40%
id                                   compaction type       keyspace table       
   completed  total       unit  progress
d8b65440-ee6a-11e5-b44b-4deeb5ac98a3 Secondary index build sharon   
resource_bench 9464081317 20988668046 bytes 45.09%
id                                   compaction type       keyspace table       
   completed  total       unit  progress
d8b3bc30-ee6a-11e5-a152-db40f4fbe6b8 Secondary index build sharon   
resource_bench 9471325678 17061191471 bytes 55.51%
id                                   compaction type       keyspace table       
   completed  total       unit  progress
d8b45870-ee6a-11e5-b26b-53ed13e9667e Secondary index build sharon   
resource_bench 9120598050 18921737677 bytes 48.20%
id                                   compaction type       keyspace table       
   completed  total       unit  progress
d8b45870-ee6a-11e5-b2a3-331c04173c53 Secondary index build sharon   
resource_bench 8943568835 20591008789 bytes 43.43%
id                                   compaction type       keyspace table       
   completed   total       unit  progress
d8b47f80-ee6a-11e5-9fc8-0597212274c1 Secondary index build sharon   
resource_bench 10172038156 21422242706 bytes 47.48%
id                                   compaction type       keyspace table       
   completed   total       unit  progress
d8b34700-ee6a-11e5-a642-6dee841e75e5 Secondary index build sharon   
resource_bench 10161205385 18730171060 bytes 54.25%
id                                   compaction type       keyspace table       
   completed  total       unit  progress
d8b6f080-ee6a-11e5-8da4-bd70732fdab1 Secondary index build sharon   
resource_bench 9961529350 21294352899 bytes 46.78%
id                                   compaction type       keyspace table       
   completed  total       unit  progress
d8b43160-ee6a-11e5-8ac9-f59d626eedfa Secondary index build sharon   
resource_bench 9160286080 22153527929 bytes 41.35%
id                                   compaction type       keyspace table       
   completed  total       unit  progress
d8b51bc0-ee6a-11e5-8aa0-b9e611280aba Secondary index build sharon   
resource_bench 9397690505 22791700212 bytes 41.23%
id                                   compaction type       keyspace table       
   completed   total       unit  progress
d8b542d0-ee6a-11e5-8521-fbd14b018db6 Secondary index build sharon   
resource_bench 10029096174 18910334578 bytes 53.04%
id                                   compaction type       keyspace table       
   completed   total       unit  progress
d8b40a50-ee6a-11e5-a7b2-4b114ced0935 Secondary index build sharon   
resource_bench 10118551269 16938426769 bytes 59.74%
id                                   compaction type       keyspace table       
   completed  total       unit  progress
d8b039c0-ee6a-11e5-9a98-ff9a6f2af762 Secondary index build sharon   
resource_bench 9003236945 18252472495 bytes 49.33%
{noformat}
 - *there are still A LOT of GC*
{noformat}
INFO  [Service Thread] 2016-03-20 09:46:44,695 GCInspector.java:284 - ParNew GC 
in 455ms.  CMS Old Gen: 2964960608 -> 3487392640; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:46:47,250 GCInspector.java:284 - ParNew GC 
in 460ms.  CMS Old Gen: 3487392640 -> 3990379824; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125829120 -> 125828160
INFO  [Service Thread] 2016-03-20 09:46:49,452 GCInspector.java:284 - ParNew GC 
in 414ms.  CMS Old Gen: 3990379824 -> 4445691424; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125828160 -> 125827840
INFO  [Service Thread] 2016-03-20 09:46:52,328 GCInspector.java:284 - ParNew GC 
in 484ms.  CMS Old Gen: 4445691424 -> 4968532112; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125827840 -> 125827440
INFO  [Service Thread] 2016-03-20 09:46:54,940 GCInspector.java:284 - ParNew GC 
in 495ms.  CMS Old Gen: 4968532112 -> 5462199544; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125827440 -> 125829120
INFO  [Service Thread] 2016-03-20 09:46:57,406 GCInspector.java:284 - ParNew GC 
in 463ms.  CMS Old Gen: 5462199544 -> 5924296504; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125829120 -> 125828160
INFO  [Service Thread] 2016-03-20 09:47:12,521 GCInspector.java:284 - ParNew GC 
in 670ms.  CMS Old Gen: 5924296504 -> 5985242088; Par Survivor Space: 125828160 
-> 125827760
WARN  [Service Thread] 2016-03-20 09:47:12,528 GCInspector.java:282 - 
ConcurrentMarkSweep GC in 11108ms.  CMS Old Gen: 5985242088 -> 4121205176; Par 
Eden Space: 1006632960 -> 0; Par Survivor Space: 125827760 -> 0
INFO  [Service Thread] 2016-03-20 09:47:41,050 GCInspector.java:284 - ParNew GC 
in 209ms.  CMS Old Gen: 4748643432 -> 4955516288; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:49:29,246 GCInspector.java:284 - ParNew GC 
in 206ms.  CMS Old Gen: 3790280880 -> 3996984480; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:49:39,634 GCInspector.java:284 - ParNew GC 
in 272ms.  CMS Old Gen: 4361111240 -> 4620112304; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 45286592 -> 125829120
INFO  [Service Thread] 2016-03-20 09:49:41,290 GCInspector.java:284 - ParNew GC 
in 289ms.  CMS Old Gen: 4620112304 -> 4937794000; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:49:43,160 GCInspector.java:284 - ParNew GC 
in 383ms.  CMS Old Gen: 4937794000 -> 5426684640; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:49:45,309 GCInspector.java:284 - ParNew GC 
in 431ms.  CMS Old Gen: 5426684640 -> 5947461904; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:49:47,370 GCInspector.java:284 - ParNew GC 
in 402ms.  CMS Old Gen: 5947461904 -> 6398462800; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:49:49,857 GCInspector.java:284 - ParNew GC 
in 423ms.  CMS Old Gen: 5444204224 -> 5937269640; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:49:52,407 GCInspector.java:284 - ParNew GC 
in 455ms.  CMS Old Gen: 3157064816 -> 3668162472; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:49:54,572 GCInspector.java:284 - ParNew GC 
in 400ms.  CMS Old Gen: 2840515104 -> 3287142872; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:49:57,162 GCInspector.java:284 - ParNew GC 
in 465ms.  CMS Old Gen: 3287142872 -> 3792393832; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:49:59,862 GCInspector.java:284 - ParNew GC 
in 470ms.  CMS Old Gen: 3792393832 -> 4304257848; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125829120 -> 125828240
INFO  [Service Thread] 2016-03-20 09:50:02,071 GCInspector.java:284 - ParNew GC 
in 418ms.  CMS Old Gen: 4304257848 -> 4743542344; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125828240 -> 125828000
INFO  [Service Thread] 2016-03-20 09:50:04,859 GCInspector.java:284 - ParNew GC 
in 502ms.  CMS Old Gen: 4743542344 -> 5263286088; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125828000 -> 125827760
INFO  [Service Thread] 2016-03-20 09:50:07,747 GCInspector.java:284 - ParNew GC 
in 492ms.  CMS Old Gen: 5263286088 -> 5773111032; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125827760 -> 125827520
INFO  [Service Thread] 2016-03-20 09:50:10,679 GCInspector.java:284 - ParNew GC 
in 383ms.  CMS Old Gen: 5773111032 -> 6032633936; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125827520 -> 125827280
INFO  [Service Thread] 2016-03-20 09:50:14,371 GCInspector.java:284 - ParNew GC 
in 212ms.  CMS Old Gen: 5728973400 -> 5854698504; Par Eden Space: 1006632960 -> 
0; Par Survivor Space: 125827280 -> 125826960
INFO  [Service Thread] 2016-03-20 09:50:38,495 GCInspector.java:284 - ParNew GC 
in 205ms.  CMS Old Gen: 5019081080 -> 5235117656; Par Eden Space: 1006632960 -> 
0;
INFO  [Service Thread] 2016-03-20 09:51:05,133 GCInspector.java:284 - ParNew GC 
in 206ms.  CMS Old Gen: 947772192 -> 1118731384; Par Eden Space: 1006632960 -> 
0;
{noformat}

The GC activity is quite worrisome because if the cluster was not IDLE and have 
active reads/writes/compactions, I would bet that such GC will bring it down 
completely ...

Do you think memory management for index build is going to be fixed along side 
with current OOM issue or do you want me to open a separate JIRA ticket to 
address the issue ?

 I can upload all the necessary logs (system.log, gc.log). I took care to clean 
all the logs right *before* starting the index creation so that we can have a 
clear view and log messages for this process

> SASI index build leads to massive OOM
> -------------------------------------
>
>                 Key: CASSANDRA-11383
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11383
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>         Environment: C* 3.4
>            Reporter: DOAN DuyHai
>         Attachments: CASSANDRA-11383.patch, new_system_log_CMS_8GB_OOM.log, 
> system.log_sasi_build_oom
>
>
> 13 bare metal machines
> - 6 cores CPU (12 HT)
> - 64Gb RAM
> - 4 SSD in RAID0
>  JVM settings:
> - G1 GC
> - Xms32G, Xmx32G
> Data set:
>  - ≈ 100Gb/per node
>  - 1.3 Tb cluster-wide
>  - ≈ 20Gb for all SASI indices
> C* settings:
> - concurrent_compactors: 1
> - compaction_throughput_mb_per_sec: 256
> - memtable_heap_space_in_mb: 2048
> - memtable_offheap_space_in_mb: 2048
> I created 9 SASI indices
>  - 8 indices with text field, NonTokenizingAnalyser,  PREFIX mode, 
> case-insensitive
>  - 1 index with numeric field, SPARSE mode
>  After a while, the nodes just gone OOM.
>  I attach log files. You can see a lot of GC happening while index segments 
> are flush to disk. At some point the node OOM ...
> /cc [~xedin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to