[
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122830#comment-14122830
]
Christian Spriegel commented on CASSANDRA-7886:
-----------------------------------------------
[~slebresne]:
{quote}I meant that if every requests hits TombstoneOverwhelmingException{quote}
My story is a bit longer, in normal operation this does not happen, not even
close.
But sometimes our customers mess up in their backend systems: Sometimes their
backend will send the same request in an endless loop, where they delete+create
a column in a row. This causes many tombstones to be created very quickly.
Currently this single customer brings down our entire landscape, due to his
requests piling up in our tomcat. Which also affects other customers.
If Cassandra were to fail instantly, then his requests would run into an error
(which they should, because he is using it wrong) and therefore would not pile
up.
{quote}Sure, but setting a fixversion is never a promise. {quote}
Thanks!
I know. But at least somebody will have it on his radar. (I hope) :-)
> TombstoneOverwhelmingException should not wait for timeout
> ----------------------------------------------------------
>
> Key: CASSANDRA-7886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Environment: Tested with Cassandra 2.0.8
> Reporter: Christian Spriegel
> Priority: Minor
> Fix For: 3.0
>
>
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will
> cause the query to be simply dropped on every data-node, but no response is
> sent back to the coordinator. Instead the coordinator waits for the specified
> read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application
> is waiting for the timeout interval for every request.Therefore, if our
> application runs into TombstoneOverwhelmingExceptions, then (sooner or later)
> our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when
> they run into a TombstoneOverwhelmingException. Then the coordinator does not
> have to wait for the timeout-interval.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)