[
https://issues.apache.org/jira/browse/TAJO-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050153#comment-15050153
]
ASF GitHub Bot commented on TAJO-1952:
--------------------------------------
Github user blrunner commented on the pull request:
https://github.com/apache/tajo/pull/846#issuecomment-163512120
Evaluated patch testing before and after ``PartitionFileFragment``
implementation as following:
* Dataset : TPCH-100G
* Tajo Cluster : 1 Master, 6 Workers
* Queries: Q1, Q3, Q5, Q6, Q7, Q8, Q9, Q10
* Tajo version for before ``PartitionFileFragment`` implementation
- 0.11.1-SNAPSHOT
* Tajo version for after ``PartitionFileFragment`` implementation:
- 0.12.0-SNAPSHOT (partitions exist on catalog)
- 0.12.0-SNAPSHOT (partitions doesn't exist on catalog)
The results were same as following:
* The number and order of execution blocks
* The number of stags in a each execution block
* The number of rows in a result
* All tuples in a result (excluded a few floating point value)
> Implement PartitionFileFragment
> -------------------------------
>
> Key: TAJO-1952
> URL: https://issues.apache.org/jira/browse/TAJO-1952
> Project: Tajo
> Issue Type: Improvement
> Components: Planner/Optimizer, Storage
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
> Fix For: 0.12.0, 0.11.1
>
> Attachments: TAJO-1952.patch
>
>
> Currently, PartitionedTableScanNode contains the list of partitions and it
> seems to me that the list has some problems as following:
> 1. Duplicate Informs: Task contains Fragment which specify target directory
> or target file for scanning. A path of partition lists already would write to
> Fragment.
> 2. Network Resource: When scanning lost of partition, it will occupy network
> resource, for example, several hundred kilobytes or more. It looks like an
> unnecessary resource because Fragment already has the path of partitions.
> I want to improve above problems by implementing new Fragment called
> PartitionedFileFragment. Currently, I'm planning the implementation as
> following:
> * PartitionedFileFragment will borrow FileFragment and it contains the
> partition path and the partition key values.
> * Remove the path array of partitions from PartitionedTableScanNode.
> * Implement a method for getting filtered partition directories in
> FileTableSpace.
> * Implement a method for making PartitionedFileFragment array.
> * Before making splits, call above method and use it for making splits.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)