[
https://issues.apache.org/jira/browse/TAJO-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144089#comment-15144089
]
ASF GitHub Bot commented on TAJO-2030:
--------------------------------------
Github user blrunner closed the pull request at:
https://github.com/apache/tajo/pull/932
> Use list S3 files using AmazonS3Client instead of using S3A
> -----------------------------------------------------------
>
> Key: TAJO-2030
> URL: https://issues.apache.org/jira/browse/TAJO-2030
> Project: Tajo
> Issue Type: Sub-task
> Components: S3
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
> Fix For: 0.12.0
>
>
> AWS S3 provides bulk listing API. It takes the common prefix of all input
> paths as a parameter and returns all the objects whose prefixes start with
> the common prefix in blocks of 1000.
> If we will use AmazonS3Client for listing S3 files instead of using S3A, this
> will improve performance. To prove this idea, I adopted PrestoFileSystem
> instead of S3AFileSystem. When pruning partition filters, PrestoFileSystem
> was faster much more than S3AFileSystem.
> Here is my benchmark results for the following queries:
> {code}
> 1 partition : select count(*) from lineitem where l_shipdate = '1992-01-02';
> 30 partitions: select count(*) from lineitem where l_shipdate > '1992-01-01'
> and l_shipdate < '1992-02-01';
> 90 partitions: select count(*) from lineitem where l_shipdate >=
> '1992-01-01' and l_shipdate < '1992-04-01';
> 151 partitions: select count(*) from lineitem where l_shipdate >=
> '1992-01-01' and l_shipdate < '1992-06-01';
> {code}
> || (#) of partitions||PrestoFileSystem(ms)||S3AFileSystem(ms)||
> |1|677|800|
> |30|2753|6977|
> |90|6825|13772|
> |151|13834|25701|
> For the reference, I used tpc-h 1g dataset and set {{l_shipdate}} column of
> {{lineitem}} table to partition column.
> I think there are ways to resolve this as following:
> - Borrow PrestoFileSystem and related codes from Presto
> - Implement necessary codes to S3TableSpace by referencing Presto
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)