[ 
https://issues.apache.org/jira/browse/PIG-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136727#comment-14136727
 ] 

Rohini Palaniswamy commented on PIG-4069:
-----------------------------------------

[~daijy],
   I would still like to checkin this patch with the   if (tezOp.isLimit()..) 
block commented out in TezDAGBuilder as I am planning to do PIG-4039 - package 
refactor of tez classes tomorrow, and do not want to spend time rebasing this 
patch later. Also this patch has more than the low src fraction. It cleans up 
TezOperator a bit and fixes an bug with Order by followed by filter or foreach 
and then Limit. Can you review it?

I would love to keep the block uncommented if not for the fact that it reruns 
the tasks of first vertex (TEZ-1590) when second vertex finishes before it. 
Otherwise the time taken would be same if LIMIT is the last statement and might 
improve performance if there are operations following the LIMIT. This issue 
coule be happening even now with LIMIT but this patch would increase the 
probability very much. Will uncomment the code once TEZ-1590 is fixed. 

No time to work on the custom scheduler right now. Will create a separate jira 
to work on it next quarter or year as LIMIT is not very high priority for 
optimization and this would require features in Tez as well.  


> Limit reduce task should start as soon as one map task finishes
> ---------------------------------------------------------------
>
>                 Key: PIG-4069
>                 URL: https://issues.apache.org/jira/browse/PIG-4069
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>             Fix For: 0.14.0
>
>         Attachments: PIG-4069-1.patch
>
>
>   Set very low values for 
> ShuffleVertexManager.TEZ_AM_SHUFFLE_VERTEX_MANAGER_MIN_SRC_FRACTION and 
> ShuffleVertexManager.TEZ_AM_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION in case 
> of LIMIT job not following an order by so that the reduce task starts as soon 
> as 1 map task finishes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to