[
https://issues.apache.org/jira/browse/PIG-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich updated PIG-1660:
--------------------------------
Fix Version/s: 0.10
> Consider passing result of COUNT/COUNT_STAR to LIMIT
> -----------------------------------------------------
>
> Key: PIG-1660
> URL: https://issues.apache.org/jira/browse/PIG-1660
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.7.0
> Reporter: Viraj Bhat
> Fix For: 0.10
>
>
> In realistic scenarios we need to split a dataset into segments by using
> LIMIT, and like to achieve that goal within the same pig script. Here is a
> case:
> {code}
> A = load '$DATA' using PigStorage(',') as (id, pvs);
> B = group A by ALL;
> C = foreach B generate COUNT_STAR(A) as row_cnt;
> -- get the low 50% segment
> D = order A by pvs;
> E = limit D (C.row_cnt * 0.2);
> store E in '$Eoutput';
> -- get the high 20% segment
> F = order A by pvs DESC;
> G = limit F (C.row_cnt * 0.2);
> store G in '$Goutput';
> {code}
> Since LIMIT only accepts constants, we have to split the operation to two
> steps in order to pass in the constants for the LIMIT statements. Please
> consider bringing this feature in so the processing can be more efficient.
> Viraj
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira