[
https://issues.apache.org/jira/browse/HADOOP-18521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18034293#comment-18034293
]
ASF GitHub Bot commented on HADOOP-18521:
-----------------------------------------
github-actions[bot] commented on PR #5133:
URL: https://github.com/apache/hadoop/pull/5133#issuecomment-3470795037
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> ABFS ReadBufferManager buffer sharing across concurrent HTTP requests
> ---------------------------------------------------------------------
>
> Key: HADOOP-18521
> URL: https://issues.apache.org/jira/browse/HADOOP-18521
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/azure
> Affects Versions: 3.3.2, 3.3.3, 3.3.4
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Critical
> Labels: pull-request-available
> Fix For: 3.3.5
>
> Attachments: HADOOP-18521 ABFS ReadBufferManager buffer sharing
> across concurrent HTTP requests.pdf, validating-csv-record-io.sc
>
>
> {{AbfsInputStream.close()}} can trigger the return of buffers used for active
> prefetch GET requests into the ReadBufferManager free buffer pool.
> A subsequent prefetch by a different stream in the same process may acquire
> this same buffer. This can lead to risk of corruption of its own prefetched
> data, data which may then be returned to that other thread.
> The full analysis in in the document attached to this JIRA.
> The issue is fixed in Hadoop 3.3.5
> h2. Emergency fix through site configuration
> On releases without the fix for this (3.3.2-3.3.4), the bug can be avoided by
> disabling all prefetching
> {code:java}
> fs.azure.readaheadqueue.depth = 0
> {code}
> h2. Automated probes for risk of exposure
> The [cloudstore|https://github.com/steveloughran/cloudstore] diagnostics JAR
> has a command
> [safeprefetch|https://github.com/steveloughran/cloudstore/blob/trunk/src/main/site/safeprefetch.md]
> which probes an abfs client for being vulnerable. It does this through
> {{PathCapabilities.hasPathCapability()}} probes. It can be invoked on the
> command line to validate the version/configuration
> Consult [the
> source|https://github.com/steveloughran/cloudstore/blob/trunk/src/main/java/org/apache/hadoop/fs/store/abfs/SafePrefetch.java#L96]
> to see how to do this programmatically.
> Note also that the tool's
> [mkcsv|https://github.com/steveloughran/cloudstore/blob/trunk/src/main/site/mkcsv.md]
> command can be used to generate the multi-GB CSV files needed to trigger the
> condition and so verify that the issue exists.
> h2. Microsoft Announcement
> {code}
> From: Sneha Vijayarajan
> Subject: RE: Alert ! ABFS Driver - Possible data corruption on read path
> Hi,
> One of the contributions made to ABFS Driver has a potential to cause data
> corruption on read
> path.
> Please check if the below change is part of any of your releases:
> HADOOP-17156. Purging the buffers associated with input streams during
> close() by mukund-thakur
> · Pull Request #3285 · apache/hadoop (github.com)
> RCA: Scenario that can lead to data corruption:
> Driver allocates a bunch of prefetch buffers at init and are shared by
> different instances of
> InputStreams created within that process. These prefetch buffers could be in
> 3 stages –
> * In ReadAheadQueue : request for prefetch logged
> * In ProgressList : Work has begun to talk to backend store to get the
> requested data
> * In CompletedList: Prefetch data is now available for consumption.
> When multiple InputStreams have prefetch buffers across these states and
> close is triggered on
> any InputStream/s, the commit above will remove buffers allotted to
> respective stream from all
> the 3 lists and also declare that the buffers are available for new
> prefetches to happen, but
> no action to cancel/prevent buffer from being updated with ongoing network
> request is done.
> Data corruption can happen if one such freed up buffer from InProgressList is
> allotted to a new
> prefetch request and then the buffer got filled up with the previous stream’s
> network request.
> Mitigation: If this change is present in any release, kindly help communicate
> to your customers
> to immediately set below config to 0 in their clusters. This will disable
> prefetches which can
> have an impact on perf but will prevent the possibility of data corruption.
> fs.azure.readaheadqueue.depth: Sets the readahead queue depth in
> AbfsInputStream. In case the
> set value is negative the read ahead queue depth will be set as
> Runtime.getRuntime().availableProcessors(). By default the value will be 2.
> To disable
> readaheads, set this value to 0. If your workload is doing only random reads
> (non-sequential)
> or you are seeing throttling, you may try setting this value to 0.
> Next steps: We are getting help to post the notifications for this in Apache
> groups. Work on
> HotFix is also ongoing. Will update this thread once the change is checked in.
> Please reach out for any queries or clarifications.
> Thanks,
> Sneha Vijayarajan
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]