[
https://issues.apache.org/jira/browse/CASSANDRA-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376010#comment-17376010
]
Caleb Rackliffe edited comment on CASSANDRA-16776 at 8/5/21, 5:04 PM:
----------------------------------------------------------------------
[trunk|https://github.com/apache/cassandra/pull/1098]
[CircleCI|https://app.circleci.com/pipelines/github/maedhroz/cassandra?branch=CASSANDRA-16776]
To see how the patch reduces allocations, enable compaction profiling in
{{CompactionAllocationTest}} and run the test
{{widePartitionsSingleIndexedColumn}}. (This test indexes only one of 4 normal
columns.) You should get about a 13% improvement in bytes allocated for index
builds.
ex.
{noformat}
INFO [main] 2021-07-06 15:16:01,720 CompactionAllocationTest.java:466 - ***
widePartitionsSingleIndexedColumn compaction summary
INFO [main] 2021-07-06 15:16:01,720 CompactionAllocationTest.java:467 -
463337000 bytes, 13099437 objects, 2145078 /partition, 2145 /row, 0 cpu
{noformat}
...then with the patch...
{noformat}
INFO [main] 2021-07-06 15:11:51,958 CompactionAllocationTest.java:466 - ***
widePartitionsSingleIndexedColumn compaction summary
INFO [main] 2021-07-06 15:11:51,958 CompactionAllocationTest.java:467 -
402830648 bytes, 11802336 objects, 1864956 /partition, 1864 /row, 0 cpu
{noformat}
was (Author: maedhroz):
[trunk|https://github.com/apache/cassandra/pull/1098]
[CircleCI
J8|https://app.circleci.com/pipelines/github/maedhroz/cassandra/284/workflows/d095f8c2-d17d-4f9f-b5dd-0ab50f98901f]
[CircleCI
J11|https://app.circleci.com/pipelines/github/maedhroz/cassandra/284/workflows/4664cf14-7e0c-4680-8154-0a4fd340770a]
To see how the patch reduces allocations, enable compaction profiling in
{{CompactionAllocationTest}} and run the test
{{widePartitionsSingleIndexedColumn}}. (This test indexes only one of 4 normal
columns.) You should get about a 13% improvement in bytes allocated for index
builds.
ex.
{noformat}
INFO [main] 2021-07-06 15:16:01,720 CompactionAllocationTest.java:466 - ***
widePartitionsSingleIndexedColumn compaction summary
INFO [main] 2021-07-06 15:16:01,720 CompactionAllocationTest.java:467 -
463337000 bytes, 13099437 objects, 2145078 /partition, 2145 /row, 0 cpu
{noformat}
...then with the patch...
{noformat}
INFO [main] 2021-07-06 15:11:51,958 CompactionAllocationTest.java:466 - ***
widePartitionsSingleIndexedColumn compaction summary
INFO [main] 2021-07-06 15:11:51,958 CompactionAllocationTest.java:467 -
402830648 bytes, 11802336 objects, 1864956 /partition, 1864 /row, 0 cpu
{noformat}
> modify SecondaryIndexManager#indexPartition() to retrieve only columns for
> which indexes are actually being built
> -----------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-16776
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16776
> Project: Cassandra
> Issue Type: Improvement
> Components: Feature/2i Index
> Reporter: Caleb Rackliffe
> Assignee: Caleb Rackliffe
> Priority: Normal
> Fix For: 4.x
>
> Attachments: index1.png, index2.png
>
>
> Secondary indexes are (for the moment) built as special compaction tasks via
> {{SecondaryIndexBuilder}}. From a profiling perspective, the fun begins in
> {{SecondaryIndexManager.indexPartition()}}. The work above it in
> {{SecondaryIndexBuilder}} is just key iteration.
> !index1.png!
> Two basic things happen in {{indexPartition()}}. First, we read a single
> partition in its entirety, and then we send individual rows to the
> {{Indexer}}. When we read these partitions, we use {{ColumnFilter.all()}},
> which ends up materializing full rows, even when we’re indexing a single
> column (or at least fewer columns than we need for all the indexes
> participating in the build). If we narrowed this to fetch only the necessary
> columns, we might be able to create less garbage in
> {{AbstractBTreePartition#searchIterator()}} when we create a copy of the
> underlying full row from disk.
> In some initial testing, I’ve been using a simple schema with fairly narrow
> rows.
> {noformat}
> CREATE TABLE tlp_stress.allow_filtering (
> partition_id text,
> row_id int,
> payload text,
> value int,
> PRIMARY KEY (partition_id, row_id)
> ) WITH CLUSTERING ORDER BY (row_id ASC)
> {noformat}
> The price of deserializing these rows is still visible, however, in the
> results of some basic sampling profiling.
> !index2.png!
> The possible optimization above to avoid unnecessary copying of a row’s
> columns would also narrow cell deserialization only to indexed cells, which
> would probably be very beneficial for index builds with very wide rows. One
> minor wrinkle in all of this is that since 3.0, it has been possible to
> create indexes one entire rows, rather than single columns, so we’d have to
> keep that case in mind.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]