[ 
https://issues.apache.org/jira/browse/TAJO-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837586#comment-13837586
 ] 

Jihoon Son commented on TAJO-292:
---------------------------------

When the size of the intermediate data is sufficiently large, the number of 
tasks looks to be the number of worker slots.
In my opinion, since the number of tasks is fixed regardless of the size of the 
intermediate data, the task failure overhead will be increased as the size of 
the intermediate data increases.
How about limit the maximum task size?

> Too many intermediate partition files
> -------------------------------------
>
>                 Key: TAJO-292
>                 URL: https://issues.apache.org/jira/browse/TAJO-292
>             Project: Tajo
>          Issue Type: Bug
>          Components: repartitioning
>    Affects Versions: 0.2-incubating
>            Reporter: Hyunsik Choi
>            Assignee: Jinho Kim
>            Priority: Critical
>             Fix For: 0.8-incubating
>
>         Attachments: TAJO-292.patch
>
>
> Unlike the before, the number of partitions are being currently determined by 
> the volume size and the number of distinct keys. It can cause unnecessary 
> overheads. We need to improve the partition number determiner to consider the 
> number of cluster nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to