[ 
https://issues.apache.org/jira/browse/CASSANDRA-12796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15653748#comment-15653748
 ] 

ASF GitHub Bot commented on CASSANDRA-12796:
--------------------------------------------

GitHub user mmajercik opened a pull request:

    https://github.com/apache/cassandra/pull/83

    12796 2.2

    This is a proposed patch for CASSANDRA-12796

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mmajercik/cassandra 12796-2.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cassandra/pull/83.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #83
    
----
commit de57fc5ddc3fffdd6a1eed2dee53a638d5053fab
Author: mmajercik <mmajer...@specter.firstmobileaffiliate.com>
Date:   2016-10-14T13:54:02Z

    Changed operation group granularity to page rathen than partition when 
rebuilding secondary index

commit f5d4f1cfb8dbaf550bb1685408279cd6935d3cbf
Author: mmajercik <mmajer...@specter.firstmobileaffiliate.com>
Date:   2016-11-10T08:22:24Z

    replaced tabs with spaces

----


> Heap exhaustion when rebuilding secondary index over a table with wide 
> partitions
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12796
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12796
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Milan Majercik
>            Priority: Critical
>
> We have a table with rather wide partition and a secondary index defined over 
> it. As soon as we try to rebuild the index we observed exhaustion of Java 
> heap and eventual OOM error. After a lengthy investigation we have managed to 
> find a culprit which appears to be a wrong granule of barrier issuances in 
> method {{org.apache.cassandra.db.Keyspace.indexRow}}:
> {code}
>         try (OpOrder.Group opGroup = cfs.keyspace.writeOrder.start()){html}
>         {
>             Set<SecondaryIndex> indexes = 
> cfs.indexManager.getIndexesByNames(idxNames);
>             Iterator<ColumnFamily> pager = QueryPagers.pageRowLocally(cfs, 
> key.getKey(), DEFAULT_PAGE_SIZE);
>             while (pager.hasNext())
>             {
>                 ColumnFamily cf = pager.next();
>                 ColumnFamily cf2 = cf.cloneMeShallow();
>                 for (Cell cell : cf)
>                 {
>                     if (cfs.indexManager.indexes(cell.name(), indexes))
>                         cf2.addColumn(cell);
>                 }
>                 cfs.indexManager.indexRow(key.getKey(), cf2, opGroup);
>             }
>         }
> {code}
> Please note the operation group granule is a partition of the source table 
> which poses a problem for wide partition tables as flush runnable 
> ({{org.apache.cassandra.db.ColumnFamilyStore.Flush.run()}}) won't proceed 
> with flushing secondary index memtable before completing operations prior 
> recent issue of the barrier. In our situation the flush runnable waits until 
> whole wide partition gets indexed into the secondary index memtable before 
> flushing it. This causes an exhaustion of the heap and eventual OOM error.
> After we changed granule of barrier issue in method 
> {{org.apache.cassandra.db.Keyspace.indexRow}} to query page as opposed to 
> table partition secondary index (see 
> [https://github.com/mmajercik/cassandra/commit/7e10e5aa97f1de483c2a5faf867315ecbf65f3d6?diff=unified]),
>  rebuild started to work without heap exhaustion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to