[
https://issues.apache.org/jira/browse/HADOOP-16202?focusedWorklogId=521118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-521118
]
ASF GitHub Bot logged work on HADOOP-16202:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 07/Dec/20 12:12
Start Date: 07/Dec/20 12:12
Worklog Time Spent: 10m
Work Description: steveloughran commented on pull request #2168:
URL: https://github.com/apache/hadoop/pull/2168#issuecomment-739880426
The latest patch rounds things off. This thing is ready to go in.
* We now have the option to specify the start and end of splits; the input
formats in the MR client do this.
* everywhere in the code where we explicitly download sequential datasets
request sequential IO. (actually, I've just realised `hadoop fs -head <path>`
should request random IO as well as declare split lengths...we don't want a
full GET).
its important that FS implementations don't rely on split length to set max
file len, because splits are allowed to overrun to ensure a whole record/block
is read. Apps which pass split info down to worker processes (hive &c) need to
pass in file size too if they want to save the HEAD request. It could still be
used by the input streams if they can think of a way
1. For sequential IO: end of content length = min(split-end, file-length)
for that initial request,
2 For random IO, assume it's the initial EOF.
because openFile() declares FNFEs can be delayed until reads, we could also
see if we could do an async HEAD request while processing that first GET/HEAD,
so have the final file length without blocking. That would make streams more
complex —at least now we have the option.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 521118)
Time Spent: 3h 40m (was: 3.5h)
> Stabilize openFile() and adopt internally
> -----------------------------------------
>
> Key: HADOOP-16202
> URL: https://issues.apache.org/jira/browse/HADOOP-16202
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs, fs/s3, tools/distcp
> Affects Versions: 3.3.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> The {{openFile()}} builder API lets us add new options when reading a file
> Add an option {{"fs.s3a.open.option.length"}} which takes a long and allows
> the length of the file to be declared. If set, *no check for the existence of
> the file is issued when opening the file*
> Also: withFileStatus() to take any FileStatus implementation, rather than
> only S3AFileStatus -and not check that the path matches the path being
> opened. Needed to support viewFS-style wrapping and mounting.
> and Adopt where appropriate to stop clusters with S3A reads switched to
> random IO from killing download/localization
> * fs shell copyToLocal
> * distcp
> * IOUtils.copy
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]