[ 
https://issues.apache.org/jira/browse/PIG-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488787#comment-13488787
 ] 

Cheolsoo Park commented on PIG-3021:
------------------------------------

Hi Chang,

Thank you very much for raising an issue.

I understand your argument, but that's an expected behavior. Given your 
example, here are the plans built by Pig:
{code:title=EXPLAIN b}
b: (Name: LOStore Schema: x#9:int,y#10:int)
|
|---b: (Name: LOSplitOutput Schema: x#9:int,y#10:int)
    |   |
    |   (Name: Equal Type: boolean Uid: 8)
    |   |
    |   |---x:(Name: Project Type: int Uid: 3 Input: 0 Column: 0)
    |   |
    |   |---y:(Name: Project Type: int Uid: 4 Input: 0 Column: 1)
    |
    |---1-4: (Name: LOSplit Schema: x#3:int,y#4:int)
...
{code}
{code:title=EXPLAIN c}
c: (Name: LOStore Schema: x#21:int,y#22:int)
|
|---c: (Name: LOSplitOutput Schema: x#21:int,y#22:int)
    |   |
    |   (Name: Not Type: boolean Uid: 20)
    |   |
    |   |---(Name: Equal Type: boolean Uid: 19)
    |       |
    |       |---x:(Name: Project Type: int Uid: 13 Input: 0 Column: 0)
    |       |
    |       |---y:(Name: Project Type: int Uid: 14 Input: 0 Column: 1)
    |
    |---1-7: (Name: LOSplit Schema: x#13:int,y#14:int)b: (Name: LOStore Schema: 
x#9:int,y#10:int)
...
{code}
As can be seen, b and c are filtered by expressions == and !=, and the Pig 
manual says the following [1]:
- Comparison operators: If either subexpression is null, the result is null.
- FILTER operator: If a filter expression results in null value, the filter 
does not pass them through

So if either x or y is null, they will be dropped.

In addition, the Pig manual also says [2]:
- SPLIT operator: A tuple may not be assigned to any relation.

Does this make sense?

[1] http://pig.apache.org/docs/r0.10.0/basic.html#nulls
[2] http://pig.apache.org/docs/r0.10.0/basic.html#SPLIT
                
> Split results missing records when there is null values in the column 
> comparison
> --------------------------------------------------------------------------------
>
>                 Key: PIG-3021
>                 URL: https://issues.apache.org/jira/browse/PIG-3021
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>            Reporter: Chang Luo
>
> Suppose a(x, y)
> split a into b if x==y, c otherwise;
> One will expect the union of b and c will be a.  However, if x or y is null, 
> the record won't appear in either b or c.
> To workaround this, I have to change to the following:
> split a into b if x is not null and y is not null and x==y, c otherwise;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to