[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

Branimir Lambov (JIRA) Tue, 03 Jan 2017 07:34:21 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795310#comment-15795310
 ]


Branimir Lambov commented on CASSANDRA-9754:
--------------------------------------------

As this is not a critical bug-fix and hence the 2.1 version cannot go into the 
codebase, unfortunately I cannot justify investing the time to give it a full 
review until we have a trunk patch.

I looked at the main {{BirchReader/Writer}} components again as they are likely 
to stay the same for a trunk patch. Here are my comments:
- I still think not storing the first key in intermediate nodes would save 
significant amounts of space and time and should be implemented.
- There are a pair of methods called {{binarySearch}} which return the floor 
(less than or equal) for a given key. I would prefer them to be named after 
what they produce, as {{binarySearch}} implies a certain kind of result 
(negative for non-equal) and the fact that it is implemented through binary 
search is largely an implementation detail.
- [{{entries == 1}} 
check|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-8561257b0836a3403d14d5dac9f8b3d0R393]
 looks suspect, as there should be no need for one-entry nodes in the tree. 
Could you comment why it is necessary?
- I think it is better to handle the special meaning of empty key and the 
{{reversed}} flag first thing in [the {{search}} 
method|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-8561257b0836a3403d14d5dac9f8b3d0R432]
 rather than propagating it into the {{binarySearch}} calls, especially since 
you know the position of the first ({{0}}) and last 
({{descriptor.getFirstNodeOffset() - descriptor.getAlignedPageSize()}}) leaf 
nodes in the tree. The iterator initialization already does that.
- The meaning of "matching" in the [{{search}} 
doc|https://github.com/mkjellman/cassandra/commit/b17f2c1317326fac7b6864a2fc61d7ee2580f740#diff-8561257b0836a3403d14d5dac9f8b3d0R429]
 is unclear. What happens on no equal element present? If it returns the floor, 
please state so.
- {{BirchIterator}} could use a forward/reversed implementation split.
- There's a lot of potential off-by-one mishap in {{BirchIterator}}, and not 
only in the reverse case:
-- the first element returned in the forward case can be less than 
{{searchKey}} (if not equal);
-- the respective problem is also there in the reverse case;
-- the page we find the entry in during initialization ({{currentPage}}) is not 
the page we apply that index to during the first {{computeNext}} ({{currentPage 
- 1}});
-- the column iteration code probably masks the first two at some non-trivial 
efficiency cost, but the latter looks like something that can get to the 
surface as missed data.
- The [level generation loop in 
{{BirchWriter}}|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-c5fa7e9cc1eac71a75b38caa716f64c3R260]
 is effectively a {{while (true)}} loop as we always reset 
{{inProgressInnerNodeLevel}} before going into it.
- The list should never become empty, thus the [emptyness 
check|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-c5fa7e9cc1eac71a75b38caa716f64c3R282]
 is suspect -- if necessary it would indicate an error in the logic; I'd 
replace is with a non-empty assertion.

> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9754
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Assignee: Michael Kjellman
>            Priority: Minor
>             Fix For: 4.x
>
>         Attachments: 0f8e28c220fd5af6c7b5dd2d3dab6936c4aa4b6b.patch, 
> gc_collection_times_with_birch.png, gc_collection_times_without_birch.png, 
> gc_counts_with_birch.png, gc_counts_without_birch.png, 
> perf_cluster_1_with_birch_read_latency_and_counts.png, 
> perf_cluster_1_with_birch_write_latency_and_counts.png, 
> perf_cluster_2_with_birch_read_latency_and_counts.png, 
> perf_cluster_2_with_birch_write_latency_and_counts.png, 
> perf_cluster_3_without_birch_read_latency_and_counts.png, 
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
>  Looking at a heap dump of 2.0 cluster, I found that majority of the objects 
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with 
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K 
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for 
> GC. Can this be improved by not creating so many objects?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9754) Make index info heap friendly for large CQL partitions

Reply via email to