[jira] [Updated] (CASSANDRA-15432) The "read defragmentation" optimization does not work

Sylvain Lebresne (Jira) Mon, 17 Aug 2020 02:48:58 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-15432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne updated CASSANDRA-15432:
-----------------------------------------
          Fix Version/s:     (was: 3.11.x)
                             (was: 4.x)
                             (was: 3.0.x)
                         4.0-beta2
                         3.11.8
                         3.0.22
          Since Version: 1.1.0
    Source Control Link: 
3.0:https://github.com/apache/cassandra/commit/e2ecdf268a82fa3ac0f4c9fe77ab35bca33cc72a,
 
3.11:https://github.com/apache/cassandra/commit/ecd23f1da5894511cccac6c8445f962f3b73f733,
 trunk:https://github.com/apache/cassandra/commit/efce6b39fb557314fad0cb56b0
             Resolution: Fixed
                 Status: Resolved  (was: Ready to Commit)

Thanks for the review. CI doesn't seem to show anything new broken so committed.

> The "read defragmentation" optimization does not work
> -----------------------------------------------------
>
>                 Key: CASSANDRA-15432
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15432
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Local Write-Read Paths
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Normal
>             Fix For: 3.0.22, 3.11.8, 4.0-beta2
>
>
> The so-called "read defragmentation" that has been added way back with 
> CASSANDRA-2503 actually does not work, and never has. That is, the 
> defragmentation writes do happen, but they only additional load on the nodes 
> without helping anything, and are thus a clear negative.
> The "read defragmentation" (which only impact so-called "names queries") 
> kicks in when a read hits "too many" sstables (> 4 by default), and when it 
> does, it writes down the result of that read. The assumption being that the 
> next read for that data would only read the newly written data, which if not 
> still in memtable would at least be in a single sstable, thus speeding that 
> next read.
> Unfortunately, this is not how this work. When we defrag and write the result 
> of our original read, we do so with the timestamp of the data read (as we 
> should, changing the timestamp would be plain wrong). And as a result, 
> following reads will read that data first, but will have no way to tell that 
> no more sstables should be read. Technically, the 
> [{{reduceFilter}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L830]
>  call will not return {{null}} because the {{currentMaxTs}} will be higher 
> than at least some of the data in the result, and this until we've read from 
> as many sstables than in the original read.
> I see no easy way to fix this. It might be possible to make it work with 
> additional per-sstable metadata, but nothing sufficiently simple and cheap to 
> be worth it comes to mind. And I thus suggest simply removing that code.
> For the record, I'll note that there is actually a 2nd problem with that 
> code: currently, we "defrag" a read even if we didn't got data for everything 
> that the query requests. This also is "wrong" even if we ignore the first 
> issue: a following read that would read the defragmented data would also have 
> no way to know to not read more sstables to try to get the missing parts. 
> This problem would be fixeable, but is obviously overshadowed by the previous 
> one anyway.
> Anyway, as mentioned, I suggest to just remove the "optimization" (which 
> again, never optimized anything) altogether, and happy to provide the simple 
> patch.
> The only question might be in which versions? This impact all versions, but 
> this isn't a correction bug either, "just" a performance one. So do we want 
> 4.0 only or is there appetite for earlier?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-15432) The "read defragmentation" optimization does not work

Reply via email to