[
https://issues.apache.org/jira/browse/CASSANDRA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Evgeny Ryabitskiy updated CASSANDRA-3545:
-----------------------------------------
Attachment: (was: IndexSearchPerformance.png)
> Fix very low Secondary Index performance
> ----------------------------------------
>
> Key: CASSANDRA-3545
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3545
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 0.7.0
> Reporter: Evgeny Ryabitskiy
> Fix For: 1.0.6
>
> Attachments: 0001-3545.patch, 0002-cleanup.patch
>
>
> While performing index search + value filtering over large Index Row ( ~100k
> keys per index value) with chunks (size of 512-1024 keys) search time is
> about 8-12 seconds, which is very very low.
> After profiling I got this picture:
> 60% of search time is calculating MD5 hash with MessageDigester (Of cause it
> is because of RundomPartitioner).
> 33% of search time (half of all MD5 hash calculating time) is double
> calculating of MD5 for comparing two row keys while rotating Index row to
> startKey (when performing search query for next chunk).
> I see several performance improvements:
> 1) Use good algorithm to search startKey in sorted collection, that is faster
> then iteration over all keys. This solution is on first place because it
> simple, need only local code changes and should solve problem (increase
> search in multiple times).
> 2) Don't calculate MD5 hash for startKey every time. It's optimal to compute
> it once (so search will be twice faster).
> Also need local code changes.
> 3) Think about something faster that MD5 for hashing (like
> TigerRandomPartitioner with Tiger/128 hash).
> Need research and maybe this research was done.
> 4) Don't use Tokens (with MD5 hash for RandomPartitioner) for comparing and
> sorting keys in index rows. In index rows, keys can be stored and compared
> with simple Byte Comparator.
> This solution requires huge code changes.
> I'm going to start from first solution. Next improvements can be done with
> next tickets.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira