[ 
https://issues.apache.org/jira/browse/TAJO-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915810#comment-13915810
 ] 

Hyunsik Choi commented on TAJO-613:
-----------------------------------

+1 for this issue. We absolutely need the way to handle stragglers. 
Fortunately, TAJO-589 is in progress. It enables QueryMaster to track the 
progresses of tasks. The feature of TAJO-589 allows QueryMaster to detect 
unexpected slowness of tasks which may occur in large clusters. I believe that 
we can do straggler handling after TAJO-589.

> Hedging against unusually slow TajoWorker
> -----------------------------------------
>
>                 Key: TAJO-613
>                 URL: https://issues.apache.org/jira/browse/TAJO-613
>             Project: Tajo
>          Issue Type: Improvement
>            Reporter: Keuntae Park
>
> When one of disks in my Tajo cluster becomes not healthy (that means slow 
> response time due to hardware problem), it results in extremely slow query 
> processing time.
> Following is kernel log of the server that has unhealthy disk:
> {noformat}
> Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] Unhandled error code
> Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] Result: 
> hostbyte=DID_ERROR driverbyte=DRIVER_OK
> Feb 18 15:20:12 ceo-tajo03 kernel: sd 0:2:4:0: [sde] CDB: Read(16): 88 00 00 
> 00 00 01 57 ec 66 32 00 00 01 00 00 00
> ...
> {noformat}
> This problem makes TaskRunner, which normally takes less than 3 seconds for 
> the given query,  takes 1700 seconds, and total query execution time also 
> becomes 1750 seconds, which is normally 70 seconds before.    
> I think Tajo needs a mechanism like speculative execution of MapReduce.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to