Hi Zheng,

Thanks for your response. I had initially used ints, but due to the error I
got, I changed them to Integers. I have now reverted the code to use ints as
suggested by you.

My problem:
I have a table called products_bought which has a number of products bought
by each customer ordered by count bought. I want to get the top x customers
of each product.

Table products_bought
 product_id customer_id product_count
  1      1            6
  1      2            5
  1      3            4
  2      1            8
  2      2            4
  2      3            1

  I want the say, top 2 results per products. Which will be:

  product_id customer_id product_count
  1      1            6
  1      2            5
  2      1            8
  2      2            4

Solution:
I create a jar with the code I sent and do the following steps in cli

1. add jar jarname
2. create temporary function topx as 'class name';
3. select topx(2, product_id, customer_id, product_count) from
products_bought

The logs give me the error:
0/02/01 16:56:28 DEBUG ipc.RPC: Call: mkdirs 23
10/02/01 16:56:28 INFO parse.SemanticAnalyzer: Completed getting MetaData in
Semantic Analysis
10/02/01 16:56:28 DEBUG parse.SemanticAnalyzer: Created Table Plan for
products_bought org.apache.hadoop.hive.ql.exec.tablescanopera...@72d8978c
10/02/01 16:56:28 DEBUG exec.FunctionRegistry: Looking up GenericUDAF: topx
FAILED: Unknown exception : Internal error: Cannot recognize int
10/02/01 16:56:28 ERROR ql.Driver: FAILED: Unknown exception : Internal
error: Cannot recognize int
java.lang.RuntimeException: Internal error: Cannot recognize int
    at
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.getPrimitiveObjectInspectorFromClass(PrimitiveObjectInspectorFactory.java:166)
    at
org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$PrimitiveConversionHelper.<init>(GenericUDFUtils.java:197)
    at
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.init(GenericUDAFBridge.java:123)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFInfo(SemanticAnalyzer.java:1592)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:1912)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2452)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:3733)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4184)
    at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4425)
    at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:249)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:281)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

I am going through the code mentioned by Zheng to see if there is something
wrong I am doing. At this point of time, I think my main concern is to get
the function to output something and to verify that Hive specific hooks are
in place. If you have any suggestions, please do let me know.

Thanks and Regards,
Sonal


On Mon, Feb 1, 2010 at 1:19 PM, Zheng Shao <[email protected]> wrote:

> The first problem is:
>
>                private Integer key;
>                private Integer attribute;
>                private Integer count;
>
> Java Integer objects are non-modifiable, which means we have to create
> a new object per row (which in turn makes the code really
> inefficient).
>
> You can change it to "private int" to make it efficient (and also
> works for Hive).
>
>
> Second, can you post your Hive query? It seems your code does not do
> what you want. You might want to take a look at
> http://issues.apache.org/jira/browse/HIVE-894 for the UDAF max_n and
> see how that works for Hive.
>
> Zheng
>
> On Sun, Jan 31, 2010 at 9:38 PM, Sonal Goyal <[email protected]>
> wrote:
> > Hi,
> >
> > I am writing a UDAF which returns the top x results per key. Lets say my
> > input is
> >
> > key attribute count
> > 1      1            6
> > 1      2            5
> > 1      3            4
> > 2      1            8
> > 2      2            4
> > 2      3            1
> >
> > I want the top 2 results per key. Which will be:
> >
> > key attribute count
> > 1      1            6
> > 1      2            5
> > 2      1            8
> > 2      2            4
> >
> > I have written a UDAF for this in the attached file. However, when I run
> the
> > code, I get the exception:
> > FAILED: Unknown exception :
> >
> org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector
> > cannot be cast to
> >
> org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableIntObjectInspector
> >
> >
> > Can anyone please let me know what I could be doing wrong?
> > Thanks and Regards,
> > Sonal
> >
>
>
>
> --
> Yours,
> Zheng
>

Reply via email to