Jihoon Son created TAJO-473:
-------------------------------
Summary: Improve the fault tolerance of LazyTaskScheduler
Key: TAJO-473
URL: https://issues.apache.org/jira/browse/TAJO-473
Project: Tajo
Issue Type: New Feature
Components: query master
Affects Versions: 0.2-incubating
Reporter: Jihoon Son
As discussed in TAJO-385 and https://reviews.apache.org/r/16455/, the
LazyTaskScheduler has a problem when tasks are failed.
When a failed task of multiple fragments is re-assigned to a node, the locality
of fragments is extremely hard to preserved because it is nearly impossible
that every fragments is stored at two or more common hosts.
A simple and good solution is that creating multiple query unit attempts for
each fragments when a failed task is reattempted. To implement this approach,
we should maintain the information of the query processing attempt for each
fragment, not for each query unit.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)