Jean-Daniel Cryans created KUDU-1868:
----------------------------------------

             Summary: Java client mishandles socket read timeouts for scans
                 Key: KUDU-1868
                 URL: https://issues.apache.org/jira/browse/KUDU-1868
             Project: Kudu
          Issue Type: Bug
          Components: client
    Affects Versions: 1.2.0
            Reporter: Jean-Daniel Cryans


Scan calls from the Java client that take more than the socket read timeout get 
retried (unless the operation timeout has expired) instead of being killed. 
Users will see this:

{code}
org.apache.kudu.client.NonRecoverableException: Invalid call sequence ID in 
scan request
{code}

Note that the right behavior here would still end up killing the scanner, so 
this is really a problem the user has to deal with! It's usually caused by slow 
IO, combined with very selection scans.

Workaround: set defaultSocketReadTimeoutMs higher, ideally equal to 
defaultOperationTimeoutMs (the defaults are 10 and 30 seconds respectively). 
But really the user should investigate why single the scans are so slow.

One potentially easy fix to this is to handle retries differently for scanners 
so that the user gets nicer exception. A harder fix is to handle socket read 
timeouts completely differently, basically it should be per-RPC and not per 
TabletClient like it is right now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to