[ 
https://issues.apache.org/jira/browse/PIG-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630490#comment-13630490
 ] 

Mythili Gopalakrishnan commented on PIG-821:
--------------------------------------------

Hi

Is it possible to do RANK over a GROUP?

For example in the following dataset, I would like to rank by Income and then 
rank within Industry by Income..

Name Industry Income

John  Banking            20,000
Jane  Banking            35,000 
Chen  Real Estate        30,000
Hari  Real Estate        22,000
Asha  Technology         26,000

I tried the following script listed below but get ERROR unexpected SYMBOL at or 
near RANK.

.....

names_by_ind = group names by industry;

rank_by_ind = foreach names_by_ind {
results = RANK names BY income DESC;
GENERATE flatten(results);
}

                
> simulate NTILE(n) , rank() functionality in pig
> -----------------------------------------------
>
>                 Key: PIG-821
>                 URL: https://issues.apache.org/jira/browse/PIG-821
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>    Affects Versions: 0.2.0
>         Environment: mithril gold -gateway 4000
>            Reporter: Rekha
>             Fix For: 0.11
>
>
> Hi,
> I came across a job which has some processing which I cant seem to get easily 
> over-the-counter from pig.
> These are NTILE() /rank() operations available in oracle.
> While I am trying to write a UDF, that is not working out too well for me 
> yet.. :(
> I have a ntile(n) over (partititon by x, y, z order by a desc, b desc) 
> operation to be done in pig scripts.
> Is there a default function in pig scripting which can do this?
> For example, lets consider a simple example at 
> http://download.oracle.com/docs/cd/B14117_01/server.101/b10759/functions091.htm
> So here, how would we ideally substitute NTILE() with? any pig counterpart 
> function/udf?
> SELECT last_name, salary, NTILE(4) OVER (ORDER BY salary DESC) 
>    AS quartile FROM employees
>    WHERE department_id = 100;
>  
> LAST_NAME                     SALARY   QUARTILE
> ------------------------- ---------- ----------
> Greenberg                      12000          1
> Faviet                          9000          1
> Chen                            8200          2
> Urman                           7800          2
> Sciarra                         7700          3
> Popp                            6900          4
>  
> In real case, i have ntile over multiple columns, so ideal way to find 
> histograms/boundary/spitting out the bucket number is needed.
> Similarly a pig function is required for rank() over(partition by a,b,c order 
> by d desc) as e
> Please let me know soon.
> Thanks & Regards,
> /Rekha

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to