[ 
https://issues.apache.org/jira/browse/HADOOP-17462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HADOOP-17462.
-------------------------------------
    Resolution: Not A Problem

Thanks [~sjlee0] for pointing outing that the {{call}} object is synchronized 
on.

It is implemented a bit differently in the reference docs.  This functionality 
is implemented within the same class itself instead of synchronizing and 
accessing the variable from an external class, but I'm thinking it should be 
the same.

https://docs.oracle.com/javase/tutorial/essential/concurrency/guardmeth.html

I'll test more in depth and if I can prove it definitively, I'll re-open this 
ticket.

> Hadoop Client getRpcResponse May Return Wrong Result
> ----------------------------------------------------
>
>                 Key: HADOOP-17462
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17462
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java|Title=Client.java}
>   /** @return the rpc response or, in case of timeout, null. */
>   private Writable getRpcResponse(final Call call, final Connection 
> connection,
>       final long timeout, final TimeUnit unit) throws IOException {
>     synchronized (call) {
>       while (!call.done) {
>         try {
>           AsyncGet.Util.wait(call, timeout, unit);
>           if (timeout >= 0 && !call.done) {
>             return null;
>           }
>         } catch (InterruptedException ie) {
>           Thread.currentThread().interrupt();
>           throw new InterruptedIOException("Call interrupted");
>         }
>       }
>  */
>   static class Call {
>     final int id;               // call id
>     final int retry;           // retry count
> ...
>     boolean done;               // true when call is done
> ...
> }
> {code}
> The {{done}} variable is not marked as {{volatile}} so the thread which is 
> checking its status is free to cache the value and never reload it even 
> though it is expected to change by a different thread.  The while loop may be 
> stuck waiting for the change, but is always looking at a cached value.  If 
> that happens, timeout will occur and then return 'null'.
> In previous versions of Hadoop, there was no time-out at this level, so it 
> would cause endless loop.  Really tough error to track down if it happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to