[ 
https://issues.apache.org/jira/browse/PIG-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982742#comment-13982742
 ] 

Hongchang Li commented on PIG-3909:
-----------------------------------

sorry for wrong format. reattach the LP.
{noformat}
#-----------------------------------------------
# New Logical Plan:
#-----------------------------------------------
D: (Name: LOStore Schema: #21:chararray)
|
|---D: (Name: LOForEach Schema: #21:chararray)
    |   |
    |   (Name: LOGenerate[false] Schema: #21:chararray)
    |   |   |
    |   |   (Name: Cast Type: chararray Uid: 21)
    |   |   |
    |   |   |---(Name: Map Type: bytearray Uid: 21 Key: fieldkey2)
    |   |       |
    |   |       |---(Name: Cast Type: map Uid: 16)
    |   |           |
    |   |           |---bagofmap:(Name: Project Type: bytearray Uid: 16 Input: 
0 Column: (*))
    |   |
    |   |---(Name: LOInnerLoad[0] Schema: bagofmap#26:bytearray)
    |
    |---C: (Name: LOFilter Schema: bagofmap#26:bytearray)
        |   |
        |   (Name: Regex Type: boolean Uid: 20)
        |   |
        |   |---(Name: Cast Type: chararray Uid: 17)
        |   |   |
        |   |   |---(Name: Map Type: bytearray Uid: 17 Key: fieldkey1)
        |   |       |
        |   |       |---(Name: Cast Type: map Uid: 26)
        |   |           |
        |   |           |---bagofmap:(Name: Project Type: bytearray Uid: 26 
Input: 0 Column: 0)
        |   |
        |   |---(Name: Constant Type: chararray Uid: 19)
        |
        |---B: (Name: LOForEach Schema: bagofmap#26:bytearray)
            |   |
            |   (Name: LOGenerate[true] Schema: bagofmap#26:bytearray)
            |   |   |
            |   |   (Name: BinCond Type: bag Uid: 16)
            |   |   |
            |   |   |---(Name: UserFunc(org.apache.pig.builtin.IsEmpty) Type: 
boolean Uid: 14)
            |   |   |   |
            |   |   |   |---bagofmap:(Name: Project Type: bag Uid: 12 Input: 0 
Column: (*))
            |   |   |
            |   |   |---(Name: Cast Type: bag Uid: 15)
            |   |   |   |
            |   |   |   |---(Name: Constant Type: bytearray Uid: 15)
            |   |   |
            |   |   |---bagofmap:(Name: Project Type: bag Uid: 12 Input: 1 
Column: (*))
            |   |
            |   |---bagofmap: (Name: LOInnerLoad[0] Schema: null)
            |   |
            |   |---bagofmap: (Name: LOInnerLoad[0] Schema: null)
            |
            |---A: (Name: LOLoad Schema: 
bagofmap#12:bag{#13:tuple()})RequiredFields:null


{noformat}

> Type Casting issue
> ------------------
>
>                 Key: PIG-3909
>                 URL: https://issues.apache.org/jira/browse/PIG-3909
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.12.0, 0.11.1
>            Reporter: Hongchang Li
>            Assignee: Daniel Dai
>             Fix For: 0.13.0
>
>         Attachments: PIG-3909-1.patch
>
>
>   This issue is very close to https://issues.apache.org/jira/browse/PIG-1191, 
> which had been closed for version 0.6.0. Steps to reproduce the issue:
>   Pig script as below:
> {code:title=input7.pig}
> A = load 'polisan/input7.txt' as (bagofmap:{});
> B = foreach A generate FLATTEN((IsEmpty(bagofmap) ? null : bagofmap)) AS 
> bagofmap;
> C = filter B by (chararray)bagofmap#'fieldkey1' matches 'po.*';
> D = foreach C generate (chararray)bagofmap#'fieldkey2';
> dump D;
> {code}
>   input data as below: 
> {code:title=polisan/input7.txt}
> {([fieldkey1#polisan,fieldkey2#lily])}
> {code}
>   run command "pig -x local -f input7.pig".  Exception will be thrown out 
> like below:
> {noformat}
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
> bytearray from the UDF. Cannot determine how to convert the bytearray to 
> string.
>       at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:935)
> {noformat}
>   I tried to dig into the source code, and found there were something wrong 
> with generation of Logical Plan(LP), "new TypeCheckingRelVisitor( lp, 
> collector).visit();" particularly. For the pig script I pasted in the ticket, 
> logical plan was like this,
> {noformat}
> #-----------------------------------------------
> # New Logical Plan:
> #-----------------------------------------------
> D: (Name: LOStore Schema: #21:chararray)
> |
> |---D: (Name: LOForEach Schema: #21:chararray)
>     |   |
>     |   (Name: LOGenerate[false] Schema: #21:chararray)
>     |   |   |
>     |   |   (Name: Cast Type: chararray Uid: 21)
>     |   |   |
>     |   |   |---(Name: Map Type: bytearray Uid: 21 Key: fieldkey2)
>     |   |       |
>     |   |       |---(Name: Cast Type: map Uid: 16)
>     |   |           |
>     |   |           |---bagofmap:(Name: Project Type: bytearray Uid: 16 
> Input: 0 Column: (*))
>     |   |
>     |   |---(Name: LOInnerLoad[0] Schema: bagofmap#16:bytearray)
>     |
>     |---C: (Name: LOFilter Schema: bagofmap#16:bytearray)
>         |   |
>         |   (Name: Regex Type: boolean Uid: 20)
>         |   |
>         |   |---(Name: Cast Type: chararray Uid: 17)
>         |   |   |
>         |   |   |---(Name: Map Type: bytearray Uid: 17 Key: fieldkey1)
>         |   |       |
>         |   |       |---(Name: Cast Type: map Uid: 26)        ——> Uid was 
> assigned to 26, while other  places were 16
>         |   |           |
>         |   |           |---bagofmap:(Name: Project Type: bytearray Uid: 16 
> Input: 0 Column: 0)
>         |   |
>         |   |---(Name: Constant Type: chararray Uid: 19)
>         |
>         |---B: (Name: LOForEach Schema: bagofmap#16:bytearray)
>             |   |
>             |   (Name: LOGenerate[true] Schema: bagofmap#16:bytearray)
>             |   |   |
>             |   |   (Name: BinCond Type: bag Uid: 16)
>             |   |   |
>             |   |   |---(Name: UserFunc(org.apache.pig.builtin.IsEmpty) Type: 
> boolean Uid: 14)
>             |   |   |   |
>             |   |   |   |---bagofmap:(Name: Project Type: bag Uid: 12 Input: 
> 0 Column: (*))
>             |   |   |
>             |   |   |---(Name: Cast Type: bag Uid: 15)
>             |   |   |   |
>             |   |   |   |---(Name: Constant Type: bytearray Uid: 15)
>             |   |   |
>             |   |   |---bagofmap:(Name: Project Type: bag Uid: 12 Input: 1 
> Column: (*))
>             |   |
>             |   |---bagofmap: (Name: LOInnerLoad[0] Schema: null)
>             |   |
>             |   |---bagofmap: (Name: LOInnerLoad[0] Schema: null)
>             |
>             |---A: (Name: LOLoad Schema: 
> bagofmap#12:bag{#13:tuple()})RequiredFields:null
> {noformat}
>     I followed the code, and found at first all uid of bagofmap were all 16, 
> then TypeCheckingRelVisitor.visit() was called, some cast were added, e.g., 
> to cast bagofmap from bytearray to map, at the same time, uid were also 
> recaculated. When alias C was processed, uid of bagofmap(bytearray type) was 
> changed to 26, and bagofmap in inserted CastExpression was also assigned 26. 
> While processing D, the foreach sentence, bagofmap in project expression was 
> merged back into 16, while other bagofmap of bytearray were sharing the 
> schema object, leaving the one of map type in filter-sentence 26. This leaded 
> to, loadFunction for uid 26 was missing in uid2LoadFuncMap, then caster was 
> assigned to null, and then the exception at last.
>   I tried serveral ways to make the code go well.
> 1) add implementation of function visit(CastException) for class 
> LineageFindExpVisitor, to add <26, org.apache.pig.builtin.PigStorage()> to 
> uid2LoadFuncMap, then caster will be assigned with right function. 
> {code:java}
>         @Override
>         public void visit(CastExpression cast) throws FrontendException {
>         updateUidMap(cast, cast.getExpression());        
>         }
> {code}
> 2)  to hack code of function getFieldSchema() of class ProjectExpression, to 
> make sure when uid of bagofmap were re-caculated, "26" would not be merged 
> back to 16, then  <26, org.apache.pig.builtin.PigStorage()> was passed into 
> uid2LoadFuncMap when lineageFinder.visit(); was called to generate the map.
> 3) run the script in debug mode using Eclipse, and hack the result of 
> mergeUid() to make all uid 26 be merged back to 16, then <16, 
> org.apache.pig.builtin.PigStorage()> in uid2LoadFuncMap would be enough.
>   I'm not sure which one should be ok, preferred, or none of them. But I 
> believe LP generated was not correct, and there should be some bug on 
> getFieldSchema() function of ProjectExpression class. Please confirm.
>   Besides, I wonder what uidOnlyFieldSchema, and fieldSchema mean, and their 
> difference exactly for LogicalExpression, and then I could understand better 
> implementation of getFieldSchema(), when cloneUid() should be called, and 
> when mergeUid() should be called, and when getNextUid().
>   Thanks.
>   



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to