steveloughran opened a new pull request #2530: URL: https://github.com/apache/hadoop/pull/2530
…tten collected by spark This is a PoC which, having implemented, I don't think is viable. Yes, we can fix up getFileStatus so it reads the header. It even knows to always bypass S3Guard (no inconsistencies to worry about any more). But: the blast radius of the change is too big. I'm worried about distcp or any other code which goes len =getFileStatus(path).getLen() open(path).readFully(0, len, dest) You'll get an EOF here. Find the file through a listing and you'll be OK provided S3Guard isn't updated with that GetFileStatus result, which I have seen. The ordering of probes in ITestMagicCommitProtocol.validateTaskAttemptPathAfterWrite need to be list before getFileStatus, so the S3Guard table is updated from the list. overall: danger. Even without S3Guard there's risk. Anyway, shown it can be done. And I think there's a merit in a leaner patch which attaches the marker but doesn't do any fixup. This would let us add an API call "getObjectHeaders(path) -> Future<Map<String, String>> and then use that to do the lookup. We can implement the probe for ABFS and S3, add a hasPathCapabilities for it as well as an interface the FS can implement (which passthrough filesystems would need to do). Change-Id: If56213c0c5d8ab696d2d89b48ad52874960b0920 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HADOOP-XXXXX. Fix a typo in YYY.) For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
