[
https://issues.apache.org/jira/browse/HADOOP-18546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642826#comment-17642826
]
ASF GitHub Bot commented on HADOOP-18546:
-----------------------------------------
steveloughran commented on PR #5176:
URL: https://github.com/apache/hadoop/pull/5176#issuecomment-1336158644
sorry, should have been clearer: a local spark build and spark-shell process
is ideal for replication and validation -as all splits are processed in
different worker threads in that process, it recreates the exact failure mode.
script you can take and tune for your system; uses the mkcsv command in
cloudstore JAR.
I am going to add this as a scalatest suite in the same module
https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/scripts/validating-csv-record-io.sc
> disable purging list of in progress reads in abfs stream closed
> ---------------------------------------------------------------
>
> Key: HADOOP-18546
> URL: https://issues.apache.org/jira/browse/HADOOP-18546
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.3.4
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
>
> turn off the prune of in progress reads in
> ReadBufferManager::purgeBuffersForStream
> this will ensure active prefetches for a closed stream complete. they wiill
> then get to the completed list and hang around until evicted by timeout, but
> at least prefetching will be safe.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]