[
https://issues.apache.org/jira/browse/TAJO-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116216#comment-14116216
]
Hyunsik Choi commented on TAJO-983:
-----------------------------------
Hi Mai,
Thank you for your contribution. In overall, your patch looks good to me. The
patch passes all unit tests.
I tested the patch in a single local cluster. Hash shuffle worked correctly as
follows:
{code}
default> select count(*) from lineitem;
Progress: 0%, response time: 0.193 sec
Progress: 0%, response time: 0.194 sec
Progress: 0%, response time: 0.395 sec
Progress: 4%, response time: 0.796 sec
Progress: 10%, response time: 1.398 sec
Progress: 19%, response time: 2.2 sec
Progress: 30%, response time: 3.202 sec
Progress: 41%, response time: 4.205 sec
Progress: 100%, response time: 4.994 sec
?count
-------------------------------
6001216
(1 rows, 4.994 sec, 8 B selected)
{code}
But, the patch causes file not found error when I tested range shuffle as
follows:
{noformat}
default> select l_orderkey, l_partkey from lineitem order by l_orderkey;
Progress: 0%, response time: 0.622 sec
Progress: 0%, response time: 0.623 sec
Progress: 0%, response time: 0.825 sec
Progress: 0%, response time: 1.239 sec
Progress: 0%, response time: 1.841 sec
Progress: 4%, response time: 2.642 sec
Progress: 6%, response time: 3.644 sec
Progress: 10%, response time: 4.646 sec
Progress: 15%, response time: 5.647 sec
Progress: 17%, response time: 6.65 sec
Progress: 21%, response time: 7.652 sec
Progress: 26%, response time: 8.654 sec
Progress: 28%, response time: 9.656 sec
Progress: 32%, response time: 10.678 sec
Progress: 36%, response time: 11.68 sec
Progress: 41%, response time: 12.681 sec
Progress: 43%, response time: 13.684 sec
Progress: 47%, response time: 14.687 sec
Progress: 50%, response time: 15.69 sec
Progress: 50%, response time: 16.692 sec
Progress: 75%, response time: 17.694 sec
Progress: 75%, response time: 18.696 sec
ERROR:
/tmp/tajo-hyunsik/tmpdir/q_1409368383100_0001/output/1/11_0/output/output (No
such file or directory)
java.io.FileNotFoundException:
/tmp/tajo-hyunsik/tmpdir/q_1409368383100_0001/output/1/11_0/output/output (No
such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at org.apache.tajo.storage.RawFile$RawFileScanner.init(RawFile.java:85)
at
org.apache.tajo.engine.planner.physical.ExternalSortExec$PairWiseMerger.init(ExternalSortExec.java:618)
at
org.apache.tajo.engine.planner.physical.ExternalSortExec$PairWiseMerger.init(ExternalSortExec.java:618)
at
org.apache.tajo.engine.planner.physical.ExternalSortExec$PairWiseMerger.init(ExternalSortExec.java:617)
at
org.apache.tajo.engine.planner.physical.ExternalSortExec.next(ExternalSortExec.java:299)
at
org.apache.tajo.engine.planner.physical.StoreTableExec.next(StoreTableExec.java:112)
at org.apache.tajo.worker.Task.run(Task.java:454)
at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:445)
at java.lang.Thread.run(Thread.java:744)
{noformat}
Also, you copy some code to build local file paths from
TajoPullServerService::messageReceived. It would be great if you refactor the
part into some methods and reuse the methods.
In addition, it will be easier for reviewers to leave comments if you upload
your patch to reviewboard (https://reviews.apache.org/dashboard/) or github
pull request (https://github.com/apache/tajo/pulls).
Thanks!
> Worker should directly read Intermediate data stored in localhost rather than
> fetching
> --------------------------------------------------------------------------------------
>
> Key: TAJO-983
> URL: https://issues.apache.org/jira/browse/TAJO-983
> Project: Tajo
> Issue Type: Bug
> Components: data shuffle
> Reporter: Hyunsik Choi
> Assignee: Mai Hai Thanh
> Attachments: TAJO-983.140820.0.patch.txt, TAJO-983.140822.patch.txt,
> TAJO-983.140825.1.patch.txt
>
>
> Currently, worker always fetches all intermediate via Fetcher and than store
> them in local file system even though some intermediate data already are
> stored in local file system. It is inefficient and causes unnecessary I/O and
> extra storage occupation. We should improve it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)