[
https://issues.apache.org/jira/browse/CASSANDRA-9754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795310#comment-15795310
]
Branimir Lambov commented on CASSANDRA-9754:
--------------------------------------------
As this is not a critical bug-fix and hence the 2.1 version cannot go into the
codebase, unfortunately I cannot justify investing the time to give it a full
review until we have a trunk patch.
I looked at the main {{BirchReader/Writer}} components again as they are likely
to stay the same for a trunk patch. Here are my comments:
- I still think not storing the first key in intermediate nodes would save
significant amounts of space and time and should be implemented.
- There are a pair of methods called {{binarySearch}} which return the floor
(less than or equal) for a given key. I would prefer them to be named after
what they produce, as {{binarySearch}} implies a certain kind of result
(negative for non-equal) and the fact that it is implemented through binary
search is largely an implementation detail.
- [{{entries == 1}}
check|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-8561257b0836a3403d14d5dac9f8b3d0R393]
looks suspect, as there should be no need for one-entry nodes in the tree.
Could you comment why it is necessary?
- I think it is better to handle the special meaning of empty key and the
{{reversed}} flag first thing in [the {{search}}
method|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-8561257b0836a3403d14d5dac9f8b3d0R432]
rather than propagating it into the {{binarySearch}} calls, especially since
you know the position of the first ({{0}}) and last
({{descriptor.getFirstNodeOffset() - descriptor.getAlignedPageSize()}}) leaf
nodes in the tree. The iterator initialization already does that.
- The meaning of "matching" in the [{{search}}
doc|https://github.com/mkjellman/cassandra/commit/b17f2c1317326fac7b6864a2fc61d7ee2580f740#diff-8561257b0836a3403d14d5dac9f8b3d0R429]
is unclear. What happens on no equal element present? If it returns the floor,
please state so.
- {{BirchIterator}} could use a forward/reversed implementation split.
- There's a lot of potential off-by-one mishap in {{BirchIterator}}, and not
only in the reverse case:
-- the first element returned in the forward case can be less than
{{searchKey}} (if not equal);
-- the respective problem is also there in the reverse case;
-- the page we find the entry in during initialization ({{currentPage}}) is not
the page we apply that index to during the first {{computeNext}} ({{currentPage
- 1}});
-- the column iteration code probably masks the first two at some non-trivial
efficiency cost, but the latter looks like something that can get to the
surface as missed data.
- The [level generation loop in
{{BirchWriter}}|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-c5fa7e9cc1eac71a75b38caa716f64c3R260]
is effectively a {{while (true)}} loop as we always reset
{{inProgressInnerNodeLevel}} before going into it.
- The list should never become empty, thus the [emptyness
check|https://github.com/apache/cassandra/compare/trunk...mkjellman:CASSANDRA-9754-2.1-v2#diff-c5fa7e9cc1eac71a75b38caa716f64c3R282]
is suspect -- if necessary it would indicate an error in the logic; I'd
replace is with a non-empty assertion.
> Make index info heap friendly for large CQL partitions
> ------------------------------------------------------
>
> Key: CASSANDRA-9754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9754
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Assignee: Michael Kjellman
> Priority: Minor
> Fix For: 4.x
>
> Attachments: 0f8e28c220fd5af6c7b5dd2d3dab6936c4aa4b6b.patch,
> gc_collection_times_with_birch.png, gc_collection_times_without_birch.png,
> gc_counts_with_birch.png, gc_counts_without_birch.png,
> perf_cluster_1_with_birch_read_latency_and_counts.png,
> perf_cluster_1_with_birch_write_latency_and_counts.png,
> perf_cluster_2_with_birch_read_latency_and_counts.png,
> perf_cluster_2_with_birch_write_latency_and_counts.png,
> perf_cluster_3_without_birch_read_latency_and_counts.png,
> perf_cluster_3_without_birch_write_latency_and_counts.png
>
>
> Looking at a heap dump of 2.0 cluster, I found that majority of the objects
> are IndexInfo and its ByteBuffers. This is specially bad in endpoints with
> large CQL partitions. If a CQL partition is say 6,4GB, it will have 100K
> IndexInfo objects and 200K ByteBuffers. This will create a lot of churn for
> GC. Can this be improved by not creating so many objects?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)