Re: Help writing UDAF with custom object

Zheng Shao Wed, 03 Feb 2010 13:41:37 -0800

Which version of Hive are you using?

I looked at the code for trunk and cannot find
PrimitiveObjectInspectorFactory.java:166


Zheng

On Mon, Feb 1, 2010 at 3:41 AM, Sonal Goyal <[email protected]> wrote:
> Hi Zheng,
>
> Thanks for your response. I had initially used ints, but due to the error I
> got, I changed them to Integers. I have now reverted the code to use ints as
> suggested by you.
>
> My problem:
> I have a table called products_bought which has a number of products bought
> by each customer ordered by count bought. I want to get the top x customers
> of each product.
>
> Table products_bought
>  product_id customer_id product_count
>   1      1            6
>   1      2            5
>   1      3            4
>   2      1            8
>   2      2            4
>   2      3            1
>
>   I want the say, top 2 results per products. Which will be:
>
>   product_id customer_id product_count
>   1      1            6
>   1      2            5
>   2      1            8
>   2      2            4
>
> Solution:
> I create a jar with the code I sent and do the following steps in cli
>
> 1. add jar jarname
> 2. create temporary function topx as 'class name';
> 3. select topx(2, product_id, customer_id, product_count) from
> products_bought
>
> The logs give me the error:
> 0/02/01 16:56:28 DEBUG ipc.RPC: Call: mkdirs 23
> 10/02/01 16:56:28 INFO parse.SemanticAnalyzer: Completed getting MetaData in
> Semantic Analysis
> 10/02/01 16:56:28 DEBUG parse.SemanticAnalyzer: Created Table Plan for
> products_bought org.apache.hadoop.hive.ql.exec.tablescanopera...@72d8978c
> 10/02/01 16:56:28 DEBUG exec.FunctionRegistry: Looking up GenericUDAF: topx
> FAILED: Unknown exception : Internal error: Cannot recognize int
> 10/02/01 16:56:28 ERROR ql.Driver: FAILED: Unknown exception : Internal
> error: Cannot recognize int
> java.lang.RuntimeException: Internal error: Cannot recognize int
>     at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.getPrimitiveObjectInspectorFromClass(PrimitiveObjectInspectorFactory.java:166)
>     at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$PrimitiveConversionHelper.<init>(GenericUDFUtils.java:197)
>     at
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.init(GenericUDAFBridge.java:123)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFInfo(SemanticAnalyzer.java:1592)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:1912)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2452)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:3733)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:4184)
>     at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4425)
>     at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:76)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:249)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:281)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:123)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:181)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:287)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> I am going through the code mentioned by Zheng to see if there is something
> wrong I am doing. At this point of time, I think my main concern is to get
> the function to output something and to verify that Hive specific hooks are
> in place. If you have any suggestions, please do let me know.
>
> Thanks and Regards,
> Sonal
>
>
> On Mon, Feb 1, 2010 at 1:19 PM, Zheng Shao <[email protected]> wrote:
>>
>> The first problem is:
>>
>>                private Integer key;
>>                private Integer attribute;
>>                private Integer count;
>>
>> Java Integer objects are non-modifiable, which means we have to create
>> a new object per row (which in turn makes the code really
>> inefficient).
>>
>> You can change it to "private int" to make it efficient (and also
>> works for Hive).
>>
>>
>> Second, can you post your Hive query? It seems your code does not do
>> what you want. You might want to take a look at
>> http://issues.apache.org/jira/browse/HIVE-894 for the UDAF max_n and
>> see how that works for Hive.
>>
>> Zheng
>>
>> On Sun, Jan 31, 2010 at 9:38 PM, Sonal Goyal <[email protected]>
>> wrote:
>> > Hi,
>> >
>> > I am writing a UDAF which returns the top x results per key. Lets say my
>> > input is
>> >
>> > key attribute count
>> > 1      1            6
>> > 1      2            5
>> > 1      3            4
>> > 2      1            8
>> > 2      2            4
>> > 2      3            1
>> >
>> > I want the top 2 results per key. Which will be:
>> >
>> > key attribute count
>> > 1      1            6
>> > 1      2            5
>> > 2      1            8
>> > 2      2            4
>> >
>> > I have written a UDAF for this in the attached file. However, when I run
>> > the
>> > code, I get the exception:
>> > FAILED: Unknown exception :
>> >
>> > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaIntObjectInspector
>> > cannot be cast to
>> >
>> > org.apache.hadoop.hive.serde2.objectinspector.primitive.SettableIntObjectInspector
>> >
>> >
>> > Can anyone please let me know what I could be doing wrong?
>> > Thanks and Regards,
>> > Sonal
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>
>



-- 
Yours,
Zheng

Re: Help writing UDAF with custom object

Reply via email to