Which version of Hive are you using? I looked at the code for trunk and cannot find PrimitiveObjectInspectorFactory.java:166
Zheng On Mon, Feb 1, 2010 at 3:41 AM, Sonal Goyal <[email protected]> wrote: > Hi Zheng, > > Thanks for your response. I had initially used ints, but due to the error I > got, I changed them to Integers. I have now reverted the code to use ints as > suggested by you. > > My problem: > I have a table called products_bought which has a number of products bought > by each customer ordered by count bought. I want to get the top x customers > of each product. > > Table products_bought > product_id customer_id product_count > 1 1 6 > 1 2 5 > 1 3 4 > 2 1 8 > 2 2 4 > 2 3 1 > > I want the say, top 2 results per products. Which will be: > > product_id customer_id product_count > 1 1 6 > 1 2 5 > 2 1 8 > 2 2 4 > > Solution: > I create a jar with the code I sent and do the following steps in cli > > 1. add jar jarname > 2. create temporary function topx as 'class name'; > 3. select topx(2, product_id, customer_id, product_count) from > products_bought > > The logs give me the error: > 0/02/01 16:56:28 DEBUG ipc.RPC: Call: mkdirs 23 > 10/02/01 16:56:28 INFO parse.SemanticAnalyzer: Completed getting MetaData in > Semantic Analysis > 10/02/01 16:56:28 DEBUG parse.SemanticAnalyzer: Created Table Plan for > products_bought org.apache.hadoop.hive.ql.exec.tablescanopera...@72d8978c > 10/02/01 16:56:28 DEBUG exec.FunctionRegistry: Looking up GenericUDAF: topx > FAILED: Unknown exception : Internal error: Cannot recognize int > 10/02/01 16:56:28 ERROR ql.Driver: FAILED: Unknown exception : Internal > error: Cannot recognize int > java.lang.RuntimeException: Internal error: Cannot recognize int > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.getPrimitiveObjectInspectorFromClass(PrimitiveObjectInspectorFactory.java:166) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$PrimitiveConversionHelper.<init>(GenericUDFUtils.java:197) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.init(GenericUDAFBridge.java:123) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFInfo(SemanticAnalyzer.java:1592) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:1912) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2452) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:3733) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4184) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4425) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:249) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:281) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > I am going through the code mentioned by Zheng to see if there is something > wrong I am doing. At this point of time, I think my main concern is to get > the function to output something and to verify that Hive specific hooks are > in place. If you have any suggestions, please do let me know. > > Thanks and Regards, > Sonal > > > On Mon, Feb 1, 2010 at 1:19 PM, Zheng Shao <[email protected]> wrote: >> >> The first problem is: >> >> private Integer key; >> private Integer attribute; >> private Integer count; >> >> Java Integer objects are non-modifiable, which means we have to create >> a new object per row (which in turn makes the code really >> inefficient). >> >> You can change it to "private int" to make it efficient (and also >> works for Hive). >> >> >> Second, can you post your Hive query? It seems your code does not do >> what you want. You might want to take a look at >> http://issues.apache.org/jira/browse/HIVE-894 for the UDAF max_n and >> see how that works for Hive. >> >> Zheng >> >> On Sun, Jan 31, 2010 at 9:38 PM, Sonal Goyal <[email protected]> >> wrote: >> > Hi, >> > >> > I am writing a UDAF which returns the top x results per key. Lets say my >> > input is >> > >> > key attribute count >> > 1 1 6 >> > 1 2 5 >> > 1 3 4 >> > 2 1 8 >> > 2 2 4 >> > 2 3 1 >> > >> > I want the top 2 results per key. Which will be: >> > >> > key attribute count >> > 1 1 6 >> > 1 2 5 >> > 2 1 8 >> > 2 2 4 >> > >> > I have written a UDAF for this in the attached file. However, when I run >> > the >> > code, I get the exception: >> > FAILED: Unknown exception : >> > >> > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector >> > cannot be cast to >> > >> > org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableIntObjectInspector >> > >> > >> > Can anyone please let me know what I could be doing wrong? >> > Thanks and Regards, >> > Sonal >> > >> >> >> >> -- >> Yours, >> Zheng > > -- Yours, Zheng
