[jira] [Commented] (TAJO-292) Too many intermediate partition files

Hyunsik Choi (JIRA) Wed, 04 Dec 2013 07:20:24 -0800

    [ 
https://issues.apache.org/jira/browse/TAJO-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838973#comment-13838973
 ]


Hyunsik Choi commented on TAJO-292:
-----------------------------------

For me, this is a good workaround code for this problem. Here is my comments 
about your patch.

 * It would be better to rename tajo.worker.start.cleanup to 
tajo.worker.tmpdir.cleanup-at-startup. It's because the config is for 
tajo.worker.tmpdir. It looks more consistent.
* the below code should be inserted into the end of 
WorkerManagerService::cleanup(). In addition, cleanup's return type need to be 
BoolProto.
{code}
done.run(TajoWorker.TRUE_PROTO);
{code}
** Async rpc internally keeps a callback sequence id in the concurrent map 
until it is returned. So, done.run must be called once.
 * For the same reason, the line 184 In QueryMaster should be changed to 
{code}
tajoWorkerProtocolService.cleanup(null, queryId.getProto(), NullCallback.get());
{code}

> Too many intermediate partition files
> -------------------------------------
>
>                 Key: TAJO-292
>                 URL: https://issues.apache.org/jira/browse/TAJO-292
>             Project: Tajo
>          Issue Type: Bug
>          Components: repartitioning
>    Affects Versions: 0.2-incubating
>            Reporter: Hyunsik Choi
>            Assignee: Jinho Kim
>            Priority: Critical
>             Fix For: 0.8-incubating
>
>         Attachments: TAJO-292.patch
>
>
> Unlike the before, the number of partitions are being currently determined by 
> the volume size and the number of distinct keys. It can cause unnecessary 
> overheads. We need to improve the partition number determiner to consider the 
> number of cluster nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (TAJO-292) Too many intermediate partition files

Reply via email to