[
https://issues.apache.org/jira/browse/TEPHRA-257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202550#comment-16202550
]
Andreas Neumann commented on TEPHRA-257:
----------------------------------------
It turns out that this cannot be fixed as long as Tephra uses Thrift. Even
though we could, in theory, attempt to modify Thrift's ProcessFunction class:
{code}
if(!isOneway()) {
oprot.writeMessageBegin(new TMessage(getMethodName(), TMessageType.REPLY,
seqid));
result.write(oprot);
oprot.writeMessageEnd();
oprot.getTransport().flush();
}
{code}
by wrapping this into a try block and catching any socket exceptions. But it
turns out that the flush() does not flush to the socket: due to Thrift's async
nature, it flushes to a write request queue, and the worker thread that
performs the write will experience the socket exception. At that time, we have
lost the context and can't have a callback to abort the transaction.
Thus marking this as won't fix.
> If start() encounters an RPC timeout, an invalid transaction is left behind
> ---------------------------------------------------------------------------
>
> Key: TEPHRA-257
> URL: https://issues.apache.org/jira/browse/TEPHRA-257
> Project: Tephra
> Issue Type: Bug
> Components: core
> Affects Versions: 0.13.0-incubating
> Reporter: Andreas Neumann
> Assignee: Poorna Chandra
>
> Suppose the following scenario:
> - a thrift client starts a transaction
> - the server responds, but for whatever reason it is slow
> - by the time the response is sent, the client has timed out the connection
> - now the server has started a transaction, but the client has no knowledge
> of it
> - that transaction will never be committed or aborted and eventually times out
> - it becomes an invalid transaction
> This is a common scenario when HDFS is slow and the write load is high. This
> means, a lot of change ids have to be written to a slow transaction log. Now
> we will generate invalid transactions systematically, which eventually
> degrades the performance of the entire system.
> It would be good if the server could detect this situation and abort the
> transaction immediately. This is safe to do whenever sending of the response
> fails, because we know that the client did not receive it, and hence it will
> not generate data with that transaction id.
> This is a tricky change, though: Thrift does not give us a way to intercept
> exceptions from socket failures. We would have to copy a Thrift class
> (ProcessFunction) and change it to handle exceptions that occur during the
> write of the response.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)