[
https://issues.apache.org/jira/browse/HADOOP-16202?focusedWorklogId=758614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758614
]
ASF GitHub Bot logged work on HADOOP-16202:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 19/Apr/22 16:43
Start Date: 19/Apr/22 16:43
Worklog Time Spent: 10m
Work Description: steveloughran commented on code in PR #2584:
URL: https://github.com/apache/hadoop/pull/2584#discussion_r853282822
##########
hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/mapreduce/StreamInputFormat.java:
##########
@@ -62,14 +58,8 @@ public RecordReader<Text, Text>
createRecordReader(InputSplit genericSplit,
context.progress();
// Open the file and seek to the start of the split
- Path path = split.getPath();
- FileSystem fs = path.getFileSystem(conf);
- // open the file
- final FutureDataInputStreamBuilder builder = fs.openFile(path);
- FutureIOSupport.propagateOptions(builder, conf,
- MRJobConfig.INPUT_FILE_OPTION_PREFIX,
- MRJobConfig.INPUT_FILE_MANDATORY_PREFIX);
- FSDataInputStream in = FutureIOSupport.awaitFuture(builder.build());
+ FileSystem fs = split.getPath().getFileSystem(conf);
+ FSDataInputStream in = fs.open(split.getPath());
Review Comment:
good q. not sure why i reverted (accidentally?). reverted back.
FWIW, hadoop-streaming is a module i don't think gets used any more, not now
we have things like flnk. cutting it would be good
Issue Time Tracking
-------------------
Worklog Id: (was: 758614)
Time Spent: 21.5h (was: 21h 20m)
> Enhance openFile() for better read performance against object stores
> ---------------------------------------------------------------------
>
> Key: HADOOP-16202
> URL: https://issues.apache.org/jira/browse/HADOOP-16202
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs, fs/s3, tools/distcp
> Affects Versions: 3.3.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
> Time Spent: 21.5h
> Remaining Estimate: 0h
>
> The {{openFile()}} builder API lets us add new options when reading a file
> Add an option {{"fs.s3a.open.option.length"}} which takes a long and allows
> the length of the file to be declared. If set, *no check for the existence of
> the file is issued when opening the file*
> Also: withFileStatus() to take any FileStatus implementation, rather than
> only S3AFileStatus -and not check that the path matches the path being
> opened. Needed to support viewFS-style wrapping and mounting.
> and Adopt where appropriate to stop clusters with S3A reads switched to
> random IO from killing download/localization
> * fs shell copyToLocal
> * distcp
> * IOUtils.copy
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]