[ 
https://issues.apache.org/jira/browse/CASSANDRA-18424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726350#comment-17726350
 ] 

Josh McKenzie commented on CASSANDRA-18424:
-------------------------------------------

Yeah - I took a look at the patch on the DS fork and it looks like in the "by 
bytes" case you had the luxury of relying on the CQLLimits and the machinery 
already expecting early termination of paging based on some kind of successful 
"I got as much as I asked for" logic in the {{BaseRowIterator}} / Row context.

The implementation I'm currently validating now for paging across tombstones 
has the _rather painful_ property of needing to figure out a way to gracefully 
signal up from a stopped transformation on an {{Unfiltered}} up to the 
{{AbstractQueryPager}} (or be polled by it which is the route I went) to check 
and see if any of the {{ReadCommand}}'s it owned stopped due to tombstone 
limits in order to determine whether to adjust the clustering key and 
exhaustion state we're sending back to the client or not. The fact that we've 
spent a decade relying on "if something bad related to tombstones happens in 
the bowels of your unfiltered iteration, you will explode" made this... tricky.

Intersection with {{RTBoundCloser}} is also a bit tricky; think I have a 
solution around that as well.

I've chatted with [~jlewandowski] a bit about the topic; will circle back once 
I have a patch to attach to the ticket here.

> Implement graceful paging across tombstones with short-circuit on paging 
> rather than throwing TombstoneOverwhelmingExceptions
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18424
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18424
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Messaging/Client, Messaging/Internode
>            Reporter: Josh McKenzie
>            Assignee: Josh McKenzie
>            Priority: Normal
>
> We implemented the hard stop with a {{TombstoneOverwhelmingException}} almost 
> a decade ago since paging across many tombstones was the most common way for 
> nodes to OOM as they iterated across all this data during queries and paging.
> With our current implementations and architecture / codebase, we should be 
> able to combine the {{StoppingTransformation}} and existing {{clustering}} 
> blob we pass back to clients to allow clients to optionally page across 
> tombstones when using the async api via the driver and short-circuit a page 
> when they hit the tombstone failure threshold rather than throwing a 
> {{{}TombstoneOverwhelmingException{}}}. This would allow for more flexible 
> data modeling on users' side as well as removing one of the fairly rough 
> edges of our API's we're currently constrained by.
> Making sure this is correct will require extensive fuzz-testing of 
> pagination; this should likely happen in the Harry project but we could also 
> have a bespoke model / implementation in the C* codebase we rely on in the 
> interim.
> Client warnings at the current default levels would remain; the gap between 
> warn and "short-circuit pages" (100x ratio currently, 1000 vs. 100000) should 
>  be sufficient for clients to take action on their data models well before 
> they hit this limit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to