MySQL has a function called "greatest" which does max of several
values (as opposed to max, which is an aggregate function over a
column).  Here's what it returns:

select greatest(1, 2)
2

select greatest(1,null)
null

On the other hand, the max aggregate function returns 2 when a table
column has 3 rows, with values (null, 1, 2).

So much for consistency.

So what's the answer here? I have no idea. Erroring on underspecified
behaviors and letting users handle null cases as makes sense to them
at least doesn't cause bizarre hard-to-find data bugs 12 hours into a
27-step computation.

D

On Thu, Jun 16, 2011 at 12:16 PM, Jonathan Coveney <jcove...@gmail.com> wrote:
> Do we want the Max function to be able to handle nulls? Seems fairly natural
> for it to be able to.
>
> 2011/6/16 Daniel Dai <jiany...@yahoo-inc.com>
>
>> Jonathan is right. math.MAX does not handle null input. Check for null
>> before feeding into MAX is necessary.
>>
>> Daniel
>>
>>
>> On 06/16/2011 06:45 AM, Jonathan Coveney wrote:
>>
>>> Can you check if your rank2 or rank3 values are ever null? If they are,
>>> there are some ad hoc fixes which you can do until this is fixed (and it
>>> is
>>> easy to fix, just a question of deciding what the desired handling of null
>>> values should be). I would just do something like...
>>>
>>> A = LOAD 'FF23_Filtered.txt' AS (appID, rank2, rank3);
>>> B = FILTER A BY rank2 is null AND rank3 is null;
>>> C = FOREACH A GENERATE appID, ( rank2 is null ? rank3 : rank2) as rank2, (
>>> rank3 is null ? rank2 : rank3 ) as rank3;
>>>
>>> Obvoiusly you could tweak that for whatever you want to happen if a value
>>> is
>>> null.
>>>
>>> 2011/6/16 Jonathan Coveney<jcove...@gmail.com>
>>>
>>>  Hm, just to make sure, I ran this against trunk (to see if it's just a
>>>> 0.7.0 thing or not).
>>>>
>>>> A = LOAD 'test.txt'; --this is just a blank one line file
>>>> B = FOREACH A GENERATE
>>>> org.apache.pig.piggybank.**evaluation.math.MAX(1,null);
>>>>
>>>> I also tested fedding it files from test.txt etc. It fails when there is
>>>> a
>>>> null value. The cast does not.
>>>>
>>>> 2011/6/16 Lakshminarayana 
>>>> Motamarri<narayana.gupta123@**gmail.com<narayana.gupta...@gmail.com>
>>>> >
>>>>
>>>>  Hi all,
>>>>>
>>>>> *I am receiving the following exception:*
>>>>> org.apache.pig.backend.**executionengine.ExecException: ERROR 2078:
>>>>> Caught
>>>>> error from UDF: org.apache.pig.piggybank.**evaluation.math.DoubleMax
>>>>> [Caught
>>>>> exception processing input row  [null]]
>>>>>    at
>>>>>
>>>>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.**
>>>>> expressionOperators.**POUserFunc.getNext(POUserFunc.**java:229)
>>>>>    at
>>>>>
>>>>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.**
>>>>> expressionOperators.**POUserFunc.getNext(POUserFunc.**java:263)
>>>>>    at
>>>>>
>>>>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.**
>>>>> relationalOperators.POForEach.**processPlan(POForEach.java:**269)
>>>>>    at
>>>>>
>>>>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.**
>>>>> relationalOperators.POForEach.**getNext(POForEach.java:204)
>>>>>    at
>>>>>
>>>>> org.apache.pig.backend.hadoop.**executionengine.**
>>>>> mapReduceLayer.PigMapBase.**runPipeline(PigMapBase.java:**249)
>>>>>    at
>>>>>
>>>>> org.apache.pig.backend.hadoop.**executionengine.**
>>>>> mapReduceLayer.PigMapBase.map(**PigMapBase.java:240)
>>>>>    at
>>>>>
>>>>> org.apache.pig.backend.hadoop.**executionengine.**
>>>>> mapReduceLayer.PigMapOnly$Map.**map(PigMapOnly.java:65)
>>>>>    at org.apache.hadoop.mapred.**MapRunner.run(MapRunner.java:**50)
>>>>>    at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**
>>>>> java:358)
>>>>>    at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:307)
>>>>>    at org.apache.hadoop.mapred.**Child.main(Child.java:170)
>>>>> Caused by: java.io.IOException: Caught exception processing input row
>>>>> [null]
>>>>>    at
>>>>> org.apache.pig.piggybank.**evaluation.math.DoubleMax.**
>>>>> exec(DoubleMax.java:70)
>>>>>    at
>>>>> org.apache.pig.piggybank.**evaluation.math.DoubleMax.**
>>>>> exec(DoubleMax.java:57)
>>>>>    at
>>>>>
>>>>> org.apache.pig.backend.hadoop.**executionengine.physicalLayer.**
>>>>> expressionOperators.**POUserFunc.getNext(POUserFunc.**java:201)
>>>>>    ... 10 more
>>>>> Caused by: java.lang.NullPointerException
>>>>>    ... 13 more
>>>>>
>>>>> *My Code:*
>>>>> *FFW2 = Load 'final_free_w2.txt';
>>>>> FFW3 = Load 'final_free_w3.txt';
>>>>> FFW2_RankG_RankCate = FOREACH FFW2 GENERATE $0, $4, $3;
>>>>> FFW3_RankG_RankCate = FOREACH FFW3 GENERATE $0, $4, $3;
>>>>> FF23 = JOIN FFW2_RankG_RankCate BY $0, FFW3_RankG_RankCate BY $0;
>>>>> FF23_Filtered = Foreach FF23 Generate $0,$2,$5;
>>>>>    STORE FF23_Filtered INTO 'FF23_Filtered.txt';
>>>>>
>>>>>    REGISTER
>>>>> /home/training/Desktop/1pig/**pig-0.7.0/contrib/piggybank/**
>>>>> piggybank.jar
>>>>> A = LOAD 'FF23_Filtered.txt' AS (appID, rank2, rank3);
>>>>> B = FOREACH A GENERATE appID,
>>>>> org.apache.pig.piggybank.**evaluation.math.MAX((double)**rank2,
>>>>> (double)rank3);
>>>>> store B into 'FF23_FJM.txt'; *
>>>>>
>>>>>
>>>>> -->  Can any one pls let me know, what is the exact reason which is
>>>>> causing
>>>>> above exception...
>>>>> I also made sure that, the file* FF23_Filtered.txt* is not NULL.
>>>>>
>>>>> ---
>>>>> Thanks&  Regards,
>>>>> Narayan.
>>>>>
>>>>>
>>>>
>>
>

Reply via email to