[ 
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249056#comment-14249056
 ] 

Tyler Hobbs commented on CASSANDRA-7886:
----------------------------------------

bq. Hi Tyler Hobbs, sorry I kept you waiting for so long.

No worries, I know you're busy :)

bq. The commented code was meant as a preparation for WriteFailureExceptions. 
Does it perhaps make sense to fully add WriteFailureException? As a follow up 
ticket, we could implement it then for the different writes. Or do you want me 
to get rid it?

I do think it's a good idea to implement something similar for writes, and 
splitting that into a second ticket would be good.  So go ahead and delete the 
comments for this patch.

{quote}
Just to make sure that we dont touch anything new here: TOEs are logged inside 
SliceQueryFilter.collectReducedColumns already. I simply took this catch block 
from the ReadVerbHandler/RangeSliceVerbHandler and put into 
StorageProxy/MessageDeliveryTask.
I don't like that either, but I did not want to touch it. Do you still want me 
to change it?
{quote}

Yes, go ahead and remove those other try/catch blocks as well.  I can't see a 
reason why they should be suppressed once the logging statement is removed.

bq. I merged ReadTimeoutException|ReadFailureException into a single catch 
block.

Cool.  The way you did it there looks perfect.  Further up in StorageProxy 
there's an almost identical chunk of code.  Can you condense that one as well?

bq. I also added the last cell-name to the TOE, so that an administrator can 
get an estimate where to look for the tombstones. This doesn't really match the 
tickets new name, but is related to my original issue 

The many implementations of CellName don't implement {{toString()}}, so I think 
you want {{container.getComparator().getString(cell.name())}} instead.

> Coordinator should not wait for read timeouts when replicas hit Exceptions
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Assignee: Christian Spriegel
>            Priority: Minor
>              Labels: protocolv4
>             Fix For: 3.0
>
>         Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt, 
> 7886_v4_trunk.txt
>
>
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will 
> cause the query to be simply dropped on every data-node, but no response is 
> sent back to the coordinator. Instead the coordinator waits for the specified 
> read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application 
> is waiting for the timeout interval for every request.Therefore, if our 
> application runs into TombstoneOverwhelmingExceptions, then (sooner or later) 
> our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when 
> they run into a TombstoneOverwhelmingException. Then the coordinator does not 
> have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to