GitHub user babokim opened a pull request:

    https://github.com/apache/tajo/pull/115

    TAJO-992: Reduce number of hash shuffle output file.

    For this I added the following features.
    - HashShuffleAppender which is created a single instance each a 
ExecutionBlock and Partition  in a Worker.
      Therefore, all execution block's tasks in a worker share a 
HashShuffleAppender. Each task's HashShuffleWriteExec calls 
HashShuffleAppender.appends() every 'tajo.shuffle.hash.appender.buffer.size' 
tuples(default is 10,000) for coarse-grained lock.
    - Splittable IntermediateEntry
      If a intermediate file is large, it is difficult to process with multiple 
tasks. New IntermediateEntry class has page meta data which contains start 
position and length every 'tajo.shuffle.hash.appender.page.volumn-mb' 
value(default: 30MB). Repartitioner class use that meta data for making proper 
number of tasks.
    - Failure awareness  IntermediateEntry
      If specified task is failed, failed task's tuples in the intermediate 
file  should be removed. But this is impossible because that tuples are already 
written in a file. For this IntermediateEntry has Task's tuple index meta. 
RawFile's scanner can use this data. But in this patch that meta is not used. 
I'll create another for this.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/babokim/tajo TAJO-992

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #115
    
----
commit f020bdd0ead06de5903a251fe02a534880420e35
Author: 김형준 <[email protected]>
Date:   2014-08-05T11:26:38Z

    TAJO-992: Reduce number of hash shuffle output file.

commit 36c98e20d118c8f217d7c065b574136847174f8a
Author: 김형준 <[email protected]>
Date:   2014-08-05T12:15:41Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
    
    Conflicts:
        tajo-core/src/main/java/org/apache/tajo/worker/Fetcher.java

commit 06045064ec32b6111ece0abf7343402e419ca608
Author: 김형준 <[email protected]>
Date:   2014-08-06T13:57:35Z

    TAJO-992: Reduce number of hash shuffle output file.

commit 028f498eb18c9094b8ac7641d628ec58e3ffb605
Author: HyoungJun Kim <[email protected]>
Date:   2014-08-11T21:37:52Z

    TAJO-992: Reduce number of hash shuffle output file.
    Splittable IntermediateEntry.

commit e02f0cdf14b502dd949cf9cc5e7c0893ec312e10
Author: HyoungJun Kim <[email protected]>
Date:   2014-08-12T05:56:36Z

    TAJO-992: Reduce number of hash shuffle output file.
    Add some debug logs

commit 2d49339111be67d058158431a94680ab2749000d
Author: HyoungJun Kim <[email protected]>
Date:   2014-08-12T06:09:51Z

    TAJO-992: Reduce number of hash shuffle output file.
    Remove unused log

commit 88775ef40071c8c40eec63371c8e3523658886f0
Author: HyoungJun Kim <[email protected]>
Date:   2014-08-13T11:34:07Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-992
    
    Conflicts:
        tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
        tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java
        
tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
        tajo-core/src/main/java/org/apache/tajo/worker/Task.java
        tajo-core/src/main/java/org/apache/tajo/worker/TaskAttemptContext.java
        
tajo-core/src/test/java/org/apache/tajo/engine/planner/physical/TestPhysicalPlanner.java
        
tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java
        tajo-core/src/test/java/org/apache/tajo/master/TestRepartitioner.java

commit 98e6314ab4974453035647f2bc78940fcb096d9e
Author: HyoungJun Kim <[email protected]>
Date:   2014-08-13T12:36:06Z

    TAJO-992: Reduce number of hash shuffle output file.
    Fix a wrong calculation of Bytes in StorageUnit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to