[ 
https://issues.apache.org/jira/browse/TAJO-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15144089#comment-15144089
 ] 

ASF GitHub Bot commented on TAJO-2030:
--------------------------------------

Github user blrunner closed the pull request at:

    https://github.com/apache/tajo/pull/932


> Use list S3 files using AmazonS3Client instead of using S3A
> -----------------------------------------------------------
>
>                 Key: TAJO-2030
>                 URL: https://issues.apache.org/jira/browse/TAJO-2030
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: S3
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>             Fix For: 0.12.0
>
>
> AWS S3 provides bulk listing API. It takes the common prefix of all input 
> paths as a parameter and returns all the objects whose prefixes start with 
> the common prefix in blocks of 1000.
> If we will use AmazonS3Client for listing S3 files instead of using S3A, this 
> will improve performance. To prove this idea, I adopted PrestoFileSystem 
> instead of S3AFileSystem. When pruning partition filters, PrestoFileSystem 
> was faster much more than S3AFileSystem.
> Here is my benchmark results for the following queries:
> {code}
> 1 partition : select count(*) from lineitem where l_shipdate = '1992-01-02';
> 30 partitions: select count(*) from lineitem  where l_shipdate > '1992-01-01' 
> and l_shipdate < '1992-02-01';
> 90 partitions: select count(*) from lineitem  where l_shipdate >= 
> '1992-01-01' and l_shipdate < '1992-04-01';
> 151 partitions: select count(*) from lineitem where l_shipdate >= 
> '1992-01-01' and l_shipdate < '1992-06-01';
> {code}
> || (#) of partitions||PrestoFileSystem(ms)||S3AFileSystem(ms)||
> |1|677|800|
> |30|2753|6977|
> |90|6825|13772|
> |151|13834|25701|
> For the reference, I used tpc-h 1g dataset and set {{l_shipdate}} column of 
> {{lineitem}} table to partition column.
> I think there are ways to resolve this as following:
> - Borrow PrestoFileSystem and related codes from Presto
> - Implement necessary codes to S3TableSpace by referencing Presto



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to