Thanks for the nice setup!
I could easily reproduce the exception you are facing.
But that's the only good news so far :-(

I checked the plans and both are valid and should compute the correct
result for the program.
The split-of solution set delta is required because the it needs to be
repartitioned (without the annotation, the optimizer does not know that it
is in fact already correctly partitioned). One thing that made me a bit
suspicious is that the solution set delta partitioning is marked with a
Pipeline-Breaker. The pipeline breaker shouldn't make a semantic
difference, but I am not sure if it is really required and also that part
of the codebase was recently worked on.

So, a closer look and more debugging is necessary to figure out what not
working correctly here...


2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <vasilikikala...@gmail.com>:

> Hi Fabian,
>
> I am using the dblp co-authorship dataset from SNAP:
> http://snap.stanford.edu/data/com-DBLP.html
> I also pushed my slightly modified version of ConnectedComponents, here:
> https://github.com/vasia/flink/tree/cc-test. It basically generates the
> vertex dataset from the edges, so that you don't need to create it
> separately.
> The annotation that creates the error is in line #172.
>
> Thanks a lot :))
>
> -Vasia.
>
>
> On 3 April 2015 at 13:09, Fabian Hueske <fhue...@gmail.com> wrote:
>
> > That looks pretty much like a bug.
> >
> > As you said, fwd fields annotations are optional and may improve the
> > performance of a program, but never change its semantics (if set
> > correctly).
> >
> > I'll have a look at it later.
> > Would be great if you could provide some data to reproduce the bug.
> > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <vasilikikala...@gmail.com>
> > wrote:
> >
> > > Hello to my squirrels,
> > >
> > > I've been getting a NullPointerException for a DeltaIteration program
> I'm
> > > trying to implement and I could really use your help :-)
> > > It seems that some of the input Tuples of the Join operator that I'm
> > using
> > > to create the next workset / solution set delta are null.
> > > It also seems that adding ForwardedFields annotations solves the issue.
> > >
> > > I managed to reproduce the behavior using the ConnectedComponents
> > example,
> > > by removing the "@ForwardedFieldsFirst("*")" annotation from
> > > the ComponentIdFilter join.
> > > The exception message is the following:
> > >
> > > Caused by: java.lang.NullPointerException
> > > at
> > >
> > >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
> > > at
> > >
> > >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> > > at java.lang.Thread.run(Thread.java:745)
> > >
> > > I get this error locally with any sufficiently big dataset (~10000
> > nodes).
> > > When the annotation is in place, it works without problem.
> > > I also generated the optimizer plans for the two cases:
> > > - with annotation (working):
> > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
> > > - without annotation (failing):
> > > https://gist.github.com/vasia/086faa45b980bf7f4c09
> > >
> > > After visualizing the plans, the main difference I see is that in the
> > > working case, the next workset node and the solution set delta nodes
> are
> > > merged, while in the failing case they are separate.
> > >
> > > Shouldn't this work with and without annotation (but be more efficient
> > with
> > > the annotation in place)? Or am I missing something here?
> > >
> > > Thanks in advance for any help :))
> > >
> > > Cheers,
> > > - Vasia.
> > >
> >
>

Reply via email to