[
https://issues.apache.org/jira/browse/TAJO-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838973#comment-13838973
]
Hyunsik Choi commented on TAJO-292:
-----------------------------------
For me, this is a good workaround code for this problem. Here is my comments
about your patch.
* It would be better to rename tajo.worker.start.cleanup to
tajo.worker.tmpdir.cleanup-at-startup. It's because the config is for
tajo.worker.tmpdir. It looks more consistent.
* the below code should be inserted into the end of
WorkerManagerService::cleanup(). In addition, cleanup's return type need to be
BoolProto.
{code}
done.run(TajoWorker.TRUE_PROTO);
{code}
** Async rpc internally keeps a callback sequence id in the concurrent map
until it is returned. So, done.run must be called once.
* For the same reason, the line 184 In QueryMaster should be changed to
{code}
tajoWorkerProtocolService.cleanup(null, queryId.getProto(), NullCallback.get());
{code}
> Too many intermediate partition files
> -------------------------------------
>
> Key: TAJO-292
> URL: https://issues.apache.org/jira/browse/TAJO-292
> Project: Tajo
> Issue Type: Bug
> Components: repartitioning
> Affects Versions: 0.2-incubating
> Reporter: Hyunsik Choi
> Assignee: Jinho Kim
> Priority: Critical
> Fix For: 0.8-incubating
>
> Attachments: TAJO-292.patch
>
>
> Unlike the before, the number of partitions are being currently determined by
> the volume size and the number of distinct keys. It can cause unnecessary
> overheads. We need to improve the partition number determiner to consider the
> number of cluster nodes.
--
This message was sent by Atlassian JIRA
(v6.1#6144)