[ 
https://issues.apache.org/jira/browse/HADOOP-18521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629339#comment-17629339
 ] 

Steve Loughran edited comment on HADOOP-18521 at 11/7/22 6:55 PM:
------------------------------------------------------------------

This bug can be fixed by deleting one line from 
{{ReadBufferManager.purgeBuffersForStream()}}.

I am not going to provide a test for this as you need a multi GB CSV file and a 
build of spark configured to use your hadoop dist.

The latest build of cloudstore (https://github.com/steveloughran/cloudstore) 
has a command {{mkcsv}} which can create the file; the man page includes the 
spark binding info: 
https://github.com/steveloughran/cloudstore/blob/trunk/src/main/site/mkcsv.md

along with the fix, i am going to include a stream capability which the fs and 
stream can be probed for to declare that the fix is in. this allows for 
programmatic verification of the safety of releases, including with the 
cloudstore pathcapabilities command

update: no, the proposed fix is insufficent


was (Author: [email protected]):
This bug can be fixed by deleting one line from 
{{ReadBufferManager.purgeBuffersForStream()}}.

I am not going to provide a test for this as you need a multi GB CSV file and a 
build of spark configured to use your hadoop dist.

The latest build of cloudstore (https://github.com/steveloughran/cloudstore) 
has a command {{mkcsv}} which can create the file; the man page includes the 
spark binding info: 
https://github.com/steveloughran/cloudstore/blob/trunk/src/main/site/mkcsv.md

along with the fix, i am going to include a stream capability which the fs and 
stream can be probed for to declare that the fix is in. this allows for 
programmatic verification of the safety of releases, including with the 
cloudstore pathcapabilities command

> ABFS ReadBufferManager buffer sharing across concurrent HTTP requests
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-18521
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18521
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 3.3.2, 3.3.3, 3.3.4
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>              Labels: pull-request-available
>
> AbfsInputStream.close() can trigger the return of buffers used for active 
> prefetch GET requests into the ReadBufferManager free buffer pool.
> A subsequent prefetch by a different stream in the same process may acquire 
> this same buffer. This can lead to risk of corruption of its own prefetched 
> data, data which may then be returned to that other thread.
> On releases without the fix for this (3.3.2 to 3.3.4), the bug can be avoided 
> by disabling all prefetching 
> {code}
> fs.azure.readaheadqueue.depth
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to