[ 
https://issues.apache.org/jira/browse/TAJO-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299458#comment-15299458
 ] 

ASF GitHub Bot commented on TAJO-2069:
--------------------------------------

Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/1024#issuecomment-221473568
  
    I updated this PR as following:
    * Remove unnecessary modifications
    * Add mockup tests
    * Avoid to use S3Tablespace  less than hadoop 2.6.0.
    * Refactor the pom file of s3 module
    
    I found that it ran as expected on local cluster and EMR. Also it 
calculated the volume of multi level partitioned table successfully with 
following table:
    ```
    CREATE external TABLE lineitem_multilevel_p1 (
    l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8, 
l_quantity FLOAT8, 
    l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT, 
l_linestatus TEXT,
    l_commitdate TEXT, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
    ) 
    USING TEXT WITH ('text.delimiter'='|') 
    PARTITION BY COLUMN(l_shipdate text, l_receiptdate text)
    location 's3a://jhjung-us/tpch/lineitem_multilevel_p1';
    ```
    
    Additionally, I added codes for comparing this PR and 
``FileSystem::getContentsSummary`` to my gist at next site : 
https://gist.github.com/blrunner/9a8e585ff18a809afb87d8f07d94e345. I found that 
the result of ``S3Tablespace::calculateSize`` is always equals to the result of 
``FileSystem::getContentsSummary``. Also I found that 
``FileSystem::listStatus`` had been called recursively while calling 
``FileSystem::getContentsSummary``. It seems that the cause of performance 
difference is listing directories recursively.


> Implement finding the total size of all objects in a bucket with AWS SDK.
> -------------------------------------------------------------------------
>
>                 Key: TAJO-2069
>                 URL: https://issues.apache.org/jira/browse/TAJO-2069
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: Catalog, QueryMaster, S3, Storage
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>             Fix For: 0.12.0
>
>
> See the title and TAJO-2023.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to