Hi Marton,

a jstack Java stacktrace can help to identify where the code got stuck.
Can you open a JIRA and post a stacktrace there?

Cheers, Fabian


2014-09-04 17:25 GMT+02:00 Márton Balassi <[email protected]>:

> CPU load:
>
> Tested on my 4-core machine the CPU load spikes up at the beginning of the
> job and stays relatively high during the whole job when run with version
> 0.5, then finishes gracefully. On version 0.6 it works seemingly well until
> the hangup. Interestingly enough even when no more log messages appear my
> CPU utilization stays 10-15% higher per core then without running the job.
>
> logs:
>
> For both the implementation it starts like this:
>
> 09/04/2014 17:05:51: Job execution switched to status SCHEDULED
> 09/04/2014 17:05:51: DataSource (CSV Input (|)
> /home/mbalassi/git/als-comparison/data/sampledb2b.csv.txt) (1/1) switched
> to SCHEDULED
> 09/04/2014 17:05:51: Reduce(Create q as a random matrix) (1/1) switched to
> SCHEDULED
> 09/04/2014 17:05:51: PartialSolution (BulkIteration (Bulk Iteration)) (1/1)
> switched to SCHEDULED
> 09/04/2014 17:05:51: Join(Sends the columns of q with multiple keys) (1/1)
> switched to SCHEDULED
> 09/04/2014 17:05:51: CoGroup (For fixed q calculates optimal p) (1/1)
> switched to SCHEDULED
> 09/04/2014 17:05:51: Join(Sends the rows of p with multiple keys)) (1/1)
> switched to SCHEDULED
> 09/04/2014 17:05:51: CoGroup (For fixed p calculates optimal q) (1/1)
> switched to SCHEDULED
> 09/04/2014 17:05:51: Fake Tail (1/1) switched to SCHEDULED
> 09/04/2014 17:05:51: Join(Sends the columns of q with multiple keys) (1/1)
> switched to SCHEDULED
> 09/04/2014 17:05:51: CoGroup (For fixed q calculates optimal p) (1/1)
> switched to SCHEDULED
>
> [Omitted quite some healthy messages...]
>
> 09/04/2014 17:05:53: Join(Sends the rows of p with multiple keys)) (1/1)
> switched to READY
> 09/04/2014 17:05:53: Join(Sends the rows of p with multiple keys)) (1/1)
> switched to STARTING
> 09/04/2014 17:05:53: Join(Sends the rows of p with multiple keys)) (1/1)
> switched to RUNNING
> 09/04/2014 17:05:53: CoGroup (For fixed p calculates optimal q) (1/1)
> switched to READY
> 09/04/2014 17:05:53: Fake Tail (1/1) switched to READY
> 09/04/2014 17:05:53: CoGroup (For fixed p calculates optimal q) (1/1)
> switched to STARTING
> 09/04/2014 17:05:53: Fake Tail (1/1) switched to STARTING
> 09/04/2014 17:05:54: CoGroup (For fixed p calculates optimal q) (1/1)
> switched to RUNNING
> 09/04/2014 17:05:54: Fake Tail (1/1) switched to RUNNING
> 09/04/2014 17:05:54: Join(Sends the columns of q with multiple keys) (1/1)
> switched to READY
> 09/04/2014 17:05:54: Join(Sends the columns of q with multiple keys) (1/1)
> switched to STARTING
> 09/04/2014 17:05:54: Join(Sends the columns of q with multiple keys) (1/1)
> switched to RUNNING
> 09/04/2014 17:05:54: CoGroup (For fixed q calculates optimal p) (1/1)
> switched to READY
> 09/04/2014 17:05:54: CoGroup (For fixed q calculates optimal p) (1/1)
> switched to STARTING
> 09/04/2014 17:05:55: CoGroup (For fixed q calculates optimal p) (1/1)
> switched to RUNNING
>
> Flink stops here, Strato continues:
>
> 09/04/2014 17:09:01: DataSource(CSV Input (|)) (1/1) switched to FINISHING
> 09/04/2014 17:09:02: PartialSolution (BulkIteration (Bulk Iteration)) (1/1)
> switched to READY
> 09/04/2014 17:09:02: PartialSolution (BulkIteration (Bulk Iteration)) (1/1)
> switched to STARTING
> 09/04/2014 17:09:02: PartialSolution (BulkIteration (Bulk Iteration)) (1/1)
> switched to RUNNING
> 09/04/2014 17:09:03: Reduce(Create q as a random matrix) (1/1) switched to
> FINISHING
> 09/04/2014 17:09:05: Sync(BulkIteration (Bulk Iteration)) (1/1) switched to
> READY
> 09/04/2014 17:09:05: Sync(BulkIteration (Bulk Iteration)) (1/1) switched to
> STARTING
> 09/04/2014 17:09:05: Sync(BulkIteration (Bulk Iteration)) (1/1) switched to
> RUNNING
> 09/04/2014 17:09:09: Sync(BulkIteration (Bulk Iteration)) (1/1) switched to
> FINISHING
> 09/04/2014 17:09:09:
>
> DataSink(hu.sztaki.ilab.cumulonimbus.als_comparison.strato.ColumnOutputFormatStrato@7ea742a1
> )
> (1/1) switched to READY
> 09/04/2014 17:09:09:
>
> DataSink(hu.sztaki.ilab.cumulonimbus.als_comparison.strato.ColumnOutputFormatStrato@7ea742a1
> )
> (1/1) switched to STARTING
>
> [Omitted quite some healthy messages...]
>
> 09/04/2014 17:09:10: PartialSolution (BulkIteration (Bulk Iteration)) (1/1)
> switched to FINISHED
> 09/04/2014 17:09:10: CoGroup(For fixed p calculates optimal q) (1/1)
> switched to FINISHED
> 09/04/2014 17:09:10:
>
> DataSink(hu.sztaki.ilab.cumulonimbus.als_comparison.strato.ColumnOutputFormatStrato@5dcde3f3
> )
> (1/1) switched to RUNNING
> 09/04/2014 17:09:10: CoGroup(For fixed q calculates optimal p) (1/1)
> switched to FINISHING
> 09/04/2014 17:09:10:
>
> DataSink(hu.sztaki.ilab.cumulonimbus.als_comparison.strato.ColumnOutputFormatStrato@5dcde3f3
> )
> (1/1) switched to FINISHING
> 09/04/2014 17:09:11: Join(Sends the columns of q with multiple keys) (1/1)
> switched to FINISHED
> 09/04/2014 17:09:11: CoGroup(For fixed q calculates optimal p) (1/1)
> switched to FINISHED
> 09/04/2014 17:09:11:
>
> DataSink(hu.sztaki.ilab.cumulonimbus.als_comparison.strato.ColumnOutputFormatStrato@5dcde3f3
> )
> (1/1) switched to FINISHED
> 09/04/2014 17:09:11: Job execution switched to status FINISHED
>
>
>
>
>
> On Thu, Sep 4, 2014 at 3:33 PM, Ufuk Celebi <[email protected]> wrote:
>
> > Hey Marton,
> >
> > thanks for reporting the issue and the link to the repo to reproduce the
> > problem. I will look into it later today.
> >
> > If you like, you could provide some more information in the meantime:
> >
> > - How the CPU load?
> > - What are TM logs saying?
> > - Can you give a stack trace? Where is it hanging?
> >
> >
> >
> > On Thu, Sep 4, 2014 at 3:14 PM, Márton Balassi <[email protected]
> >
> > wrote:
> >
> > > Hey,
> > >
> > > We managed to produce a code, for which the legacy Stratophere 0.5
> > release
> > > implementation works nicely, however the updated Flink 0.6 release
> > > implementation hangs up for slightly larger inputs.
> > >
> > >
> > > Please check out the issue here:
> > > https://github.com/mbalassi/als-comparison
> > >
> > > Any suggestions are welcome.
> > >
> > > Cheers,
> > >
> > > Marton
> > >
> >
>

Reply via email to