[ 
https://issues.apache.org/jira/browse/TAJO-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15050153#comment-15050153
 ] 

ASF GitHub Bot commented on TAJO-1952:
--------------------------------------

Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/846#issuecomment-163512120
  
    Evaluated patch testing before and after ``PartitionFileFragment`` 
implementation as following:
    
    * Dataset : TPCH-100G
    * Tajo Cluster : 1 Master, 6 Workers
    * Queries: Q1, Q3, Q5, Q6, Q7, Q8, Q9, Q10
    * Tajo version for before  ``PartitionFileFragment`` implementation 
     -  0.11.1-SNAPSHOT
    * Tajo version for after  ``PartitionFileFragment`` implementation:
     - 0.12.0-SNAPSHOT (partitions exist on catalog)
     - 0.12.0-SNAPSHOT (partitions doesn't exist on catalog)
    
    The results were same as following:
    
    * The number and order of execution blocks
    * The number of stags in a each execution block
    * The number of rows in a result
    * All tuples in a result (excluded a few floating point value)


> Implement PartitionFileFragment
> -------------------------------
>
>                 Key: TAJO-1952
>                 URL: https://issues.apache.org/jira/browse/TAJO-1952
>             Project: Tajo
>          Issue Type: Improvement
>          Components: Planner/Optimizer, Storage
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>             Fix For: 0.12.0, 0.11.1
>
>         Attachments: TAJO-1952.patch
>
>
> Currently, PartitionedTableScanNode contains the list of partitions and it 
> seems to me that the list has some problems as following:
> 1. Duplicate Informs: Task contains Fragment which specify target directory 
> or target file for scanning. A path of partition lists already would write to 
> Fragment. 
> 2. Network Resource: When scanning lost of partition, it will occupy network 
> resource, for example, several hundred kilobytes or more. It looks like an 
> unnecessary resource because Fragment already has the path of partitions.
> I want to improve above problems by implementing new Fragment called 
> PartitionedFileFragment. Currently, I'm planning the implementation as 
> following:
> * PartitionedFileFragment will borrow FileFragment and it contains the 
> partition path and the partition key values.  
> * Remove the path array of partitions from PartitionedTableScanNode. 
> * Implement a method for getting filtered partition directories in 
> FileTableSpace.
> * Implement a method for making PartitionedFileFragment array.
> * Before making splits, call above method and use it for making splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to