[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364706#comment-16364706 ]
Steve Loughran commented on HADOOP-14943: ----------------------------------------- if you return a specific host for the data, then it reports to the scheduler the preferred location of the work...the schedulers will try and place the work there and wait a bit before giving up. What you are measuring there is how long spark waits before rescheduling You don't want location affinity in object stores, not really ... though [~ehiggs] and [~Thomas Demoor] might have different data > Add common getFileBlockLocations() emulation for object stores, including S3A > ----------------------------------------------------------------------------- > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.8.1 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Major > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org