[ 
https://issues.apache.org/jira/browse/FLINK-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653649#comment-14653649
 ] 

ASF GitHub Bot commented on FLINK-2447:
---------------------------------------

GitHub user twalthr opened a pull request:

    https://github.com/apache/flink/pull/986

    [FLINK-2447] [java api] TypeExtractor returns wrong type info when a Tuple 
has two fields of the same POJO type

    This fixes FLINK-2447 and simplifies the TypeExtractor a little bit.
    
    Actually, the `alreadySeen` variable was unneccessary. The `typeHierarchy` 
can be easily used for that. I think the problem was that the POJO-feature 
developers didn't know the concept behind the `typeHierarchy` parameter. I 
added a notice so that those bugs (hopefully) don't happen again.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/flink PojoTypeExtrBug

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/986.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #986
    
----
commit e05081dc4c445cc13fbe1addaf433cd354a50dc9
Author: twalthr <[email protected]>
Date:   2015-08-04T13:30:28Z

    [FLINK-2447] [java api] TypeExtractor returns wrong type info when a Tuple 
has two fields of the same POJO type

----


> TypeExtractor returns wrong type info when a Tuple has two fields of the same 
> POJO type
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-2447
>                 URL: https://issues.apache.org/jira/browse/FLINK-2447
>             Project: Flink
>          Issue Type: Bug
>          Components: Java API
>            Reporter: Gabor Gevay
>            Assignee: Timo Walther
>             Fix For: 0.10, 0.9.1
>
>
> Consider the following code:
> DataSet<FooBarPojo> d1 = env.fromElements(new FooBarPojo());
>               DataSet<Tuple2<FooBarPojo, FooBarPojo>> d2 = d1.map(new 
> MapFunction<FooBarPojo, Tuple2<FooBarPojo, FooBarPojo>>() {
>                       @Override
>                       public Tuple2<FooBarPojo, FooBarPojo> map(FooBarPojo 
> value) throws Exception {
>                               return null;
>                       }
>               });
> where FooBarPojo is the following type:
> public class FooBarPojo {
>       public int foo, bar;
>       public FooBarPojo() {}
> }
> This should print a tuple type with two identical fields:
> Java Tuple2<PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>, 
> PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>>
> But it prints the following instead:
> Java Tuple2<PojoType<FooBarPojo, fields = [bar: Integer, foo: Integer]>, 
> GenericType<FooBarPojo>>
> Note, that this problem causes some co-groups in Gelly to crash with 
> "org.apache.flink.api.common.InvalidProgramException: The pair of co-group 
> keys are not compatible with each other" when the vertex ID type is a POJO, 
> because the second field of the Edge type gets to be a generic type, but the 
> POJO gets recognized in the Vertex type, and getNumberOfKeyFields returns 
> different numbers for the POJO and the generic type.
> The source of the problem is the mechanism in TypeExtractor that would detect 
> recursive types (see the "alreadySeen" field in TypeExtractor), as it 
> mistakes the second appearance of FooBarPojo with a recursive field.
> Specifically the following happens: createTypeInfoWithTypeHierarchy starts to 
> process the Tuple2<FooBarPojo, FooBarPojo> type, and in line 434 it calls 
> itself for the first field, which proceeds into the privateGetForClass case 
> which correctly detects that it is a POJO, and correctly returns a 
> PojoTypeInfo; but in the meantime in line 1191, privateGetForClass adds 
> PojoTypeInfo to "alreadySeen". Then the outer createTypeInfoWithTypeHierarchy 
> approaches the second field, goes into privateGetForClass, which mistakenly 
> returns a GenericTypeInfo, as it thinks in line 1187, that a recursive type 
> is being processed.
> (Note, that if we comment out the recursive type detection (the lines that do 
> their thing with the alreadySeen field), then the output is correct.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to