[
https://issues.apache.org/jira/browse/CASSANDRA-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704022#comment-13704022
]
Jeremy Hanna commented on CASSANDRA-5741:
-----------------------------------------
One thing you can do in the meantime is to unthrottle compaction throughput to
maximize IO for building indexes, as building indexes are a type of compaction.
So during the bulk load you can use nodetool setcompactionthroughtput 0 to
unthrottle. I know that doesn't address what you're after, but it does help to
get through the index builds faster for now, fwiw.
> Provide a way to disable automatic index rebuilds during bulk loading
> ---------------------------------------------------------------------
>
> Key: CASSANDRA-5741
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5741
> Project: Cassandra
> Issue Type: Improvement
> Components: Hadoop
> Affects Versions: 1.2.6
> Reporter: Jim Zamata
>
> When using the BulkLoadOutputFormat the actual streaming of the SSTables into
> Cassandra is fast, but the index rebuilds can take several minutes. Cassandra
> does not send the response until after all of the rebuilds for a streaming
> session complete. This causes the tasks to appear to hang at 100%, since the
> record writer streams the files in its close method. If the rebuilding
> process takes too long, the tasks can actually time out.
> Many SQL databases provide bulk insert utilities that disable index updates
> to allow large amounts of data to be added quickly. This functionality would
> serve a similar purpose.
> An alternative might be an option that would allow the session to return once
> the SSTables had been successfully imported without waiting for the index
> builds to complete. However, I have noticed heavy CPU loads during the index
> rebuilds, so bulkload performance might be better if this step could be
> deferred until after all of the data is loaded.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira