[ 
https://issues.apache.org/jira/browse/HADOOP-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598309#comment-17598309
 ] 

ASF GitHub Bot commented on HADOOP-18410:
-----------------------------------------

steveloughran opened a new pull request, #4839:
URL: https://github.com/apache/hadoop/pull/4839

   
   #4766 cherrypicked to branch-3.3
   
   ----
   
   HADOOP-16202 "Enhance openFile()" added asynchronous draining of the
   remaining bytes of an S3 HTTP input stream for those operations
   (unbuffer, seek) where it could avoid blocking the active
   thread.
   
   This patch fixes the asynchronous stream draining to work and so
   return the stream back to the http pool. Without this, whenever
   unbuffer() or seek() was called on a stream and an asynchronous
   drain triggered, the connection was not returned; eventually
   the pool would be empty and subsequent S3 requests would
   fail with the message "Timeout waiting for connection from pool"
   
   The root cause was that even though the fields passed in to drain() were
   converted to references through the methods, in the lambda expression
   passed in to submit, they were direct references
   
   operation = client.submit(
    () -> drain(uri, streamStatistics,
          false, reason, remaining,
          object, wrappedStream));  /* here */
   
   Those fields were only read during the async execution, at which
   point they would have been set to null (or even a subsequent read).
   
   A new SDKStreamDrainer class peforms the draining; this is a Callable
   and can be submitted directly to the executor pool.
   
   The class is used in both the classic and prefetching s3a input streams.
   
   Also, calling unbuffer() switches the S3AInputStream from adaptive
   to random IO mode; that is, it is considered a cue that future
   IO will not be sequential, whole-file reads.
   
   Contributed by Steve Loughran.
   
   Change-Id: Ia43339302dbe837ceee4bcfc83fd9624b3c4992c
   
   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> S3AInputStream.unbuffer() async drain not releasing http connections
> --------------------------------------------------------------------
>
>                 Key: HADOOP-18410
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18410
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.9
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Blocker
>              Labels: pull-request-available
>
> Impala tcp-ds setup to s3 is hitting problems with timeout fetching http 
> connections from the s3a fs pool. Disabling s3a async drain makes this 
> problem *go away*. assumption, either those async ops are blocking, or they 
> are not releasing references properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to