[
https://issues.apache.org/jira/browse/TAJO-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299458#comment-15299458
]
ASF GitHub Bot commented on TAJO-2069:
--------------------------------------
Github user blrunner commented on the pull request:
https://github.com/apache/tajo/pull/1024#issuecomment-221473568
I updated this PR as following:
* Remove unnecessary modifications
* Add mockup tests
* Avoid to use S3Tablespace less than hadoop 2.6.0.
* Refactor the pom file of s3 module
I found that it ran as expected on local cluster and EMR. Also it
calculated the volume of multi level partitioned table successfully with
following table:
```
CREATE external TABLE lineitem_multilevel_p1 (
l_orderkey INT8, l_partkey INT8, l_suppkey INT8, l_linenumber INT8,
l_quantity FLOAT8,
l_extendedprice FLOAT8, l_discount FLOAT8, l_tax FLOAT8, l_returnflag TEXT,
l_linestatus TEXT,
l_commitdate TEXT, l_shipinstruct TEXT, l_shipmode TEXT, l_comment TEXT
)
USING TEXT WITH ('text.delimiter'='|')
PARTITION BY COLUMN(l_shipdate text, l_receiptdate text)
location 's3a://jhjung-us/tpch/lineitem_multilevel_p1';
```
Additionally, I added codes for comparing this PR and
``FileSystem::getContentsSummary`` to my gist at next site :
https://gist.github.com/blrunner/9a8e585ff18a809afb87d8f07d94e345. I found that
the result of ``S3Tablespace::calculateSize`` is always equals to the result of
``FileSystem::getContentsSummary``. Also I found that
``FileSystem::listStatus`` had been called recursively while calling
``FileSystem::getContentsSummary``. It seems that the cause of performance
difference is listing directories recursively.
> Implement finding the total size of all objects in a bucket with AWS SDK.
> -------------------------------------------------------------------------
>
> Key: TAJO-2069
> URL: https://issues.apache.org/jira/browse/TAJO-2069
> Project: Tajo
> Issue Type: Sub-task
> Components: Catalog, QueryMaster, S3, Storage
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
> Fix For: 0.12.0
>
>
> See the title and TAJO-2023.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)