[ 
https://issues.apache.org/jira/browse/CASSANDRA-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378319#comment-17378319
 ] 

Aleksei Zotov edited comment on CASSANDRA-16776 at 7/9/21, 9:44 PM:
--------------------------------------------------------------------

Hi [~maedhroz]

I checked the PR and the code looks good to me (I put a couple of minor 
comments). I'm definitely not an expert in that part of code, but logically 
your changes are clear and make perfect sense to me.

PS:

I'm trying to follow Benjamin's call seeking for reviewers. Definitely a 
non-committer review does not let it be merged, but I hope it is still useful.


was (Author: azotcsit):
Hi [~maedhroz]

I checked the PR and the code looks good to me. I'm definitely not an expert in 
that part of code, but logically your changes are clear and make perfect sense 
to me.

PS:

I'm trying to follow Benjamin's call seeking for reviewers. Definitely a 
non-committer review does not let it be merged, but I hope it is still useful.

> modify SecondaryIndexManager#indexPartition() to retrieve only columns for 
> which indexes are actually being built
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16776
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16776
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/2i Index
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 4.x
>
>         Attachments: index1.png, index2.png
>
>
> Secondary indexes are (for the moment) built as special compaction tasks via 
> {{SecondaryIndexBuilder}}. From a profiling perspective, the fun begins in 
> {{SecondaryIndexManager.indexPartition()}}. The work above it in 
> {{SecondaryIndexBuilder}} is just key iteration.
>  !index1.png! 
> Two basic things happen in {{indexPartition()}}. First, we read a single 
> partition in its entirety, and then we send individual rows to the 
> {{Indexer}}. When we read these partitions, we use {{ColumnFilter.all()}}, 
> which ends up materializing full rows, even when we’re indexing a single 
> column (or at least fewer columns than we need for all the indexes 
> participating in the build). If we narrowed this to fetch only the necessary 
> columns, we might be able to create less garbage in 
> {{AbstractBTreePartition#searchIterator()}} when we create a copy of the 
> underlying full row from disk.
> In some initial testing, I’ve been using a simple schema with fairly narrow 
> rows.
> {noformat}
> CREATE TABLE tlp_stress.allow_filtering (
>     partition_id text,
>     row_id int,
>     payload text,
>     value int,
>     PRIMARY KEY (partition_id, row_id)
> ) WITH CLUSTERING ORDER BY (row_id ASC)
> {noformat}
> The price of deserializing these rows is still visible, however, in the 
> results of some basic sampling profiling.
>  !index2.png! 
> The possible optimization above to avoid unnecessary copying of a row’s 
> columns would also narrow cell deserialization only to indexed cells, which 
> would probably be very beneficial for index builds with very wide rows. One 
> minor wrinkle in all of this is that since 3.0, it has been possible to 
> create indexes one entire rows, rather than single columns, so we’d have to 
> keep that case in mind.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to