[ 
https://issues.apache.org/jira/browse/CASSANDRA-15397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001037#comment-17001037
 ] 

Chandrasekhar Thumuluru edited comment on CASSANDRA-15397 at 12/20/19 5:50 PM:
-------------------------------------------------------------------------------

[~benedict] — Thanks for your inputs. 
 * I'll rename the class. I intentionally didn't do it in the first version of 
PR so it looks less distracting. 
 * I'll definitely do the performance comparison with million+ SSTables. Please 
note, my previous tests were not produced from read SSTables. The SSTable 
metadata was generated with random distributions. You can refer to the test 
files attached and let me know if you have any suggestions. I guess not using 
the real SSTables is fair to compare the performance of IntervalTree?
 *  I definitely share your concern on potential slowness due to linear scan, 
but I shared some code references in this 
[doc|https://docs.google.com/document/d/1vwo9ArZbtgWUwJcvZGes_69YVh4yiP9c7NQFgI0iynQ/edit?usp=sharing]
  which makes me believe we are still good. Let me know your thought on that 
too. 
* I'm willing to try the improvement proposed to the algorithm. I'll talk to my 
team to gather context around what you are suggesting and get back to you if 
I've any questions. 
* I'm definitely willing to try the proposed changes and don't mind even if the 
assumption turns out to be wrong. 


was (Author: cthumuluru):
[~benedict] — Thanks for your inputs. 
 * I'll rename the class. I intentionally didn't do it in the first version of 
PR so it looks less distracting. 
 * I'll definitely do the performance comparison with million+ SSTables. Please 
note, my previous tests were not produced from read SSTables. The SSTable 
metadata was generated with random distributions. You can refer to the test 
files attached and let me know if you have any suggestions. I guess not using 
the real SSTables is fair to compare the performance of IntervalTree?
 *  I definitely share your concern on potential slowness due to linear scan, 
but I shared some code references in this 
[doc|https://docs.google.com/document/d/1vwo9ArZbtgWUwJcvZGes_69YVh4yiP9c7NQFgI0iynQ/edit?usp=sharing]
  which makes me believe we are still good. Let me know your thought on that 
too. 
* I'm willing to try the improvement proposed to the algorithm. I'll talk to my 
team to gather context around what you are suggesting and get back to you if 
I've any questions. 
* I'm definitely willing to try the proposed changes and don't mind even it the 
assumption turns out to be wrong. 

> IntervalTree performance comparison with Linear Walk and Binary Search based 
> Elimination. 
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15397
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15397
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/SSTable
>            Reporter: Chandrasekhar Thumuluru
>            Assignee: Chandrasekhar Thumuluru
>            Priority: Low
>              Labels: pull-request-available
>         Attachments: 95p_10000_SSTable_with_5000_Searches.png, 
> 95p_15000_SSTable_with_5000_Searches.png, 
> 95p_20000_SSTable_with_5000_Searches.png, 
> 95p_25000_SSTable_with_5000_Searches.png, 
> 95p_30000_SSTable_with_5000_Searches.png, 
> 95p_5000_SSTable_with_5000_Searches.png, 
> 99p_10000_SSTable_with_5000_Searches.png, 
> 99p_15000_SSTable_with_5000_Searches.png, 
> 99p_20000_SSTable_with_5000_Searches.png, 
> 99p_25000_SSTable_with_5000_Searches.png, 
> 99p_30000_SSTable_with_5000_Searches.png, 
> 99p_5000_SSTable_with_5000_Searches.png, IntervalList.java, 
> IntervalListWithElimination.java, IntervalTreeSimplified.java, 
> Mean_10000_SSTable_with_5000_Searches.png, 
> Mean_15000_SSTable_with_5000_Searches.png, 
> Mean_20000_SSTable_with_5000_Searches.png, 
> Mean_25000_SSTable_with_5000_Searches.png, 
> Mean_30000_SSTable_with_5000_Searches.png, 
> Mean_5000_SSTable_with_5000_Searches.png, TESTS-TestSuites.xml.lz4, 
> replace_intervaltree_with_intervallist.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Cassandra uses IntervalTrees to identify the SSTables that overlap with 
> search interval. In Cassandra, IntervalTrees are not mutated. They are 
> recreated each time a mutation is required. This can be an issue during 
> repairs. In fact we noticed such issues during repair. 
> Since lists are cache friendly compared to linked lists and trees, I decided 
> to compare the search performance with:
> * Linear Walk.
> * Elimination using Binary Search (idea is to eliminate intervals using start 
> and end points of search interval). 
> Based on the tests I ran, I noticed Binary Search based elimination almost 
> always performs similar to IntervalTree or out performs IntervalTree based 
> search. The cost of IntervalTree construction is also substantial and 
> produces lot of garbage during repairs. 
> I ran the tests using random intervals to build the tree/lists and another 
> randomly generated search interval with 5000 iterations. I'm attaching all 
> the relevant graphs. The x-axis in the graphs is the search interval 
> coverage. 10p means the search interval covered 10% of the intervals. The 
> y-axis is the time the search took in nanos. 
> PS: 
> # For the purpose of test, I simplified the IntervalTree by removing the data 
> portion of the interval.  Modified the template version (Java generics) to a 
> specialized version. 
> # I used the code from Cassandra version _3.11_.
> # Time in the graph is in nanos. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to