[
https://issues.apache.org/jira/browse/PIG-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136727#comment-14136727
]
Rohini Palaniswamy commented on PIG-4069:
-----------------------------------------
[~daijy],
I would still like to checkin this patch with the if (tezOp.isLimit()..)
block commented out in TezDAGBuilder as I am planning to do PIG-4039 - package
refactor of tez classes tomorrow, and do not want to spend time rebasing this
patch later. Also this patch has more than the low src fraction. It cleans up
TezOperator a bit and fixes an bug with Order by followed by filter or foreach
and then Limit. Can you review it?
I would love to keep the block uncommented if not for the fact that it reruns
the tasks of first vertex (TEZ-1590) when second vertex finishes before it.
Otherwise the time taken would be same if LIMIT is the last statement and might
improve performance if there are operations following the LIMIT. This issue
coule be happening even now with LIMIT but this patch would increase the
probability very much. Will uncomment the code once TEZ-1590 is fixed.
No time to work on the custom scheduler right now. Will create a separate jira
to work on it next quarter or year as LIMIT is not very high priority for
optimization and this would require features in Tez as well.
> Limit reduce task should start as soon as one map task finishes
> ---------------------------------------------------------------
>
> Key: PIG-4069
> URL: https://issues.apache.org/jira/browse/PIG-4069
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Reporter: Rohini Palaniswamy
> Fix For: 0.14.0
>
> Attachments: PIG-4069-1.patch
>
>
> Set very low values for
> ShuffleVertexManager.TEZ_AM_SHUFFLE_VERTEX_MANAGER_MIN_SRC_FRACTION and
> ShuffleVertexManager.TEZ_AM_SHUFFLE_VERTEX_MANAGER_MAX_SRC_FRACTION in case
> of LIMIT job not following an order by so that the reduce task starts as soon
> as 1 map task finishes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)