[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567606#comment-16567606 ]
genericqa commented on HADOOP-14943: ------------------------------------ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HADOOP-14943 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-14943 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910618/HADOOP-14943-004.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14983/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add common getFileBlockLocations() emulation for object stores, including S3A > ----------------------------------------------------------------------------- > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.8.1 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Major > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org