[
https://issues.apache.org/jira/browse/PIG-4135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105186#comment-14105186
]
Lorand Bendig commented on PIG-4135:
------------------------------------
[~cheolsoo], thanks for pointing out this issue. Filter is a good safeguard,
but it also reduces the use cases of fetch.
I'm wondering, whether we can have an input size estimation instead, like
pig.auto.local.input.maxbytes ?
> Fetch optimization should be disabled if plan contains no limit
> ---------------------------------------------------------------
>
> Key: PIG-4135
> URL: https://issues.apache.org/jira/browse/PIG-4135
> Project: Pig
> Issue Type: Bug
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.14.0
>
> Attachments: PIG-4135-1.patch
>
>
> After deploying fetch optimization in production, a couple of users ran into
> this situation. They had fairly large input data, but after filtering it by a
> regular expression, it becomes small. So they didn't add limit to the query.
> The problem is that even though the output is small, processing the input
> must be done in the cluster not in the client. However, fetch optimization
> blindly fetches the entire input into the client since the plan is map-only
> job and finishes with dump.
--
This message was sent by Atlassian JIRA
(v6.2#6252)