[
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266856#comment-16266856
]
Yonger edited comment on HADOOP-14943 at 11/27/17 2:15 PM:
-----------------------------------------------------------
[[email protected]]I remember there are some discussion about how to configure
the fake host list, such as returning endpoint, compute hosts and a star, is
this right? I am not sure whether i understand these points totally.
I just test these four cases with 1TB dataset on query42 of TPC-DS, results are
below(seconds):
||default localhost||endpoint||star||compute host list||
|16|16|16|28|
>From this result, performance are equal in these cases except returning
>compute host list.
was (Author: iyonger):
[[email protected]]I remember there are some discussion about how to configure
the fake host list, such as returning endpoint, compute hosts and a star, is
this right? I am not sure whether i understand these points totally.
I just test these four cases with 1TB dataset on query42 of TPC-DS, results are
below(seconds):
||default localhost||endpoint||star||compute host list||
|16|16l 16|28|
>From this result, performance are equal in these cases except returning
>compute host list.
> Add common getFileBlockLocations() emulation for object stores, including S3A
> -----------------------------------------------------------------------------
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.1
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch,
> HADOOP-14943-002.patch, HADOOP-14943-003.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a
> file by the blocksize. This will stop tools using the MRv1 APIS doing the
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split
> calculation & will split up large files. but otherwise, the partitioning is
> being done more by the default values of the executing engine, rather than
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to
> hadoop-common and reused?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]