[ 
https://issues.apache.org/jira/browse/HADOOP-16202?focusedWorklogId=758614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758614
 ]

ASF GitHub Bot logged work on HADOOP-16202:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Apr/22 16:43
            Start Date: 19/Apr/22 16:43
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on code in PR #2584:
URL: https://github.com/apache/hadoop/pull/2584#discussion_r853282822


##########
hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/mapreduce/StreamInputFormat.java:
##########
@@ -62,14 +58,8 @@ public RecordReader<Text, Text> 
createRecordReader(InputSplit genericSplit,
     context.progress();
 
     // Open the file and seek to the start of the split
-    Path path = split.getPath();
-    FileSystem fs = path.getFileSystem(conf);
-    // open the file
-    final FutureDataInputStreamBuilder builder = fs.openFile(path);
-    FutureIOSupport.propagateOptions(builder, conf,
-        MRJobConfig.INPUT_FILE_OPTION_PREFIX,
-        MRJobConfig.INPUT_FILE_MANDATORY_PREFIX);
-    FSDataInputStream in = FutureIOSupport.awaitFuture(builder.build());
+    FileSystem fs = split.getPath().getFileSystem(conf);
+    FSDataInputStream in = fs.open(split.getPath());

Review Comment:
   good  q. not sure why i reverted (accidentally?). reverted back.
   
   FWIW, hadoop-streaming is a module i don't think gets used any more, not now 
we have things like flnk. cutting it would be good





Issue Time Tracking
-------------------

    Worklog Id:     (was: 758614)
    Time Spent: 21.5h  (was: 21h 20m)

> Enhance openFile() for better read performance against object stores 
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-16202
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16202
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3, tools/distcp
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 21.5h
>  Remaining Estimate: 0h
>
> The {{openFile()}} builder API lets us add new options when reading a file
> Add an option {{"fs.s3a.open.option.length"}} which takes a long and allows 
> the length of the file to be declared. If set, *no check for the existence of 
> the file is issued when opening the file*
> Also: withFileStatus() to take any FileStatus implementation, rather than 
> only S3AFileStatus -and not check that the path matches the path being 
> opened. Needed to support viewFS-style wrapping and mounting.
> and Adopt where appropriate to stop clusters with S3A reads switched to 
> random IO from killing download/localization
> * fs shell copyToLocal
> * distcp
> * IOUtils.copy



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to