[ 
https://issues.apache.org/jira/browse/CASSANDRA-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704022#comment-13704022
 ] 

Jeremy Hanna commented on CASSANDRA-5741:
-----------------------------------------

One thing you can do in the meantime is to unthrottle compaction throughput to 
maximize IO for building indexes, as building indexes are a type of compaction. 
 So during the bulk load you can use nodetool setcompactionthroughtput 0 to 
unthrottle.  I know that doesn't address what you're after, but it does help to 
get through the index builds faster for now, fwiw.
                
> Provide a way to disable automatic index rebuilds during bulk loading
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-5741
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5741
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>    Affects Versions: 1.2.6
>            Reporter: Jim Zamata
>
> When using the BulkLoadOutputFormat the actual streaming of the SSTables into 
> Cassandra is fast, but the index rebuilds can take several minutes. Cassandra 
> does not send the response until after all of the rebuilds for a streaming 
> session complete. This causes the tasks to appear to hang at 100%, since the 
> record writer streams the files in its close method.  If the rebuilding 
> process takes too long, the tasks can actually time out.
> Many SQL databases provide bulk insert utilities that disable index updates 
> to allow large amounts of data to be added quickly.  This functionality would 
> serve a similar purpose.
> An alternative might be an option that would allow the session to return once 
> the SSTables had been successfully imported without waiting for the index 
> builds to complete.  However, I have noticed heavy CPU loads during the index 
> rebuilds, so bulkload performance might be better if this step could be 
> deferred until after all of the data is loaded. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to