Hi Marton, a jstack Java stacktrace can help to identify where the code got stuck. Can you open a JIRA and post a stacktrace there?
Cheers, Fabian 2014-09-04 17:25 GMT+02:00 Márton Balassi <[email protected]>: > CPU load: > > Tested on my 4-core machine the CPU load spikes up at the beginning of the > job and stays relatively high during the whole job when run with version > 0.5, then finishes gracefully. On version 0.6 it works seemingly well until > the hangup. Interestingly enough even when no more log messages appear my > CPU utilization stays 10-15% higher per core then without running the job. > > logs: > > For both the implementation it starts like this: > > 09/04/2014 17:05:51: Job execution switched to status SCHEDULED > 09/04/2014 17:05:51: DataSource (CSV Input (|) > /home/mbalassi/git/als-comparison/data/sampledb2b.csv.txt) (1/1) switched > to SCHEDULED > 09/04/2014 17:05:51: Reduce(Create q as a random matrix) (1/1) switched to > SCHEDULED > 09/04/2014 17:05:51: PartialSolution (BulkIteration (Bulk Iteration)) (1/1) > switched to SCHEDULED > 09/04/2014 17:05:51: Join(Sends the columns of q with multiple keys) (1/1) > switched to SCHEDULED > 09/04/2014 17:05:51: CoGroup (For fixed q calculates optimal p) (1/1) > switched to SCHEDULED > 09/04/2014 17:05:51: Join(Sends the rows of p with multiple keys)) (1/1) > switched to SCHEDULED > 09/04/2014 17:05:51: CoGroup (For fixed p calculates optimal q) (1/1) > switched to SCHEDULED > 09/04/2014 17:05:51: Fake Tail (1/1) switched to SCHEDULED > 09/04/2014 17:05:51: Join(Sends the columns of q with multiple keys) (1/1) > switched to SCHEDULED > 09/04/2014 17:05:51: CoGroup (For fixed q calculates optimal p) (1/1) > switched to SCHEDULED > > [Omitted quite some healthy messages...] > > 09/04/2014 17:05:53: Join(Sends the rows of p with multiple keys)) (1/1) > switched to READY > 09/04/2014 17:05:53: Join(Sends the rows of p with multiple keys)) (1/1) > switched to STARTING > 09/04/2014 17:05:53: Join(Sends the rows of p with multiple keys)) (1/1) > switched to RUNNING > 09/04/2014 17:05:53: CoGroup (For fixed p calculates optimal q) (1/1) > switched to READY > 09/04/2014 17:05:53: Fake Tail (1/1) switched to READY > 09/04/2014 17:05:53: CoGroup (For fixed p calculates optimal q) (1/1) > switched to STARTING > 09/04/2014 17:05:53: Fake Tail (1/1) switched to STARTING > 09/04/2014 17:05:54: CoGroup (For fixed p calculates optimal q) (1/1) > switched to RUNNING > 09/04/2014 17:05:54: Fake Tail (1/1) switched to RUNNING > 09/04/2014 17:05:54: Join(Sends the columns of q with multiple keys) (1/1) > switched to READY > 09/04/2014 17:05:54: Join(Sends the columns of q with multiple keys) (1/1) > switched to STARTING > 09/04/2014 17:05:54: Join(Sends the columns of q with multiple keys) (1/1) > switched to RUNNING > 09/04/2014 17:05:54: CoGroup (For fixed q calculates optimal p) (1/1) > switched to READY > 09/04/2014 17:05:54: CoGroup (For fixed q calculates optimal p) (1/1) > switched to STARTING > 09/04/2014 17:05:55: CoGroup (For fixed q calculates optimal p) (1/1) > switched to RUNNING > > Flink stops here, Strato continues: > > 09/04/2014 17:09:01: DataSource(CSV Input (|)) (1/1) switched to FINISHING > 09/04/2014 17:09:02: PartialSolution (BulkIteration (Bulk Iteration)) (1/1) > switched to READY > 09/04/2014 17:09:02: PartialSolution (BulkIteration (Bulk Iteration)) (1/1) > switched to STARTING > 09/04/2014 17:09:02: PartialSolution (BulkIteration (Bulk Iteration)) (1/1) > switched to RUNNING > 09/04/2014 17:09:03: Reduce(Create q as a random matrix) (1/1) switched to > FINISHING > 09/04/2014 17:09:05: Sync(BulkIteration (Bulk Iteration)) (1/1) switched to > READY > 09/04/2014 17:09:05: Sync(BulkIteration (Bulk Iteration)) (1/1) switched to > STARTING > 09/04/2014 17:09:05: Sync(BulkIteration (Bulk Iteration)) (1/1) switched to > RUNNING > 09/04/2014 17:09:09: Sync(BulkIteration (Bulk Iteration)) (1/1) switched to > FINISHING > 09/04/2014 17:09:09: > > DataSink(hu.sztaki.ilab.cumulonimbus.als_comparison.strato.ColumnOutputFormatStrato@7ea742a1 > ) > (1/1) switched to READY > 09/04/2014 17:09:09: > > DataSink(hu.sztaki.ilab.cumulonimbus.als_comparison.strato.ColumnOutputFormatStrato@7ea742a1 > ) > (1/1) switched to STARTING > > [Omitted quite some healthy messages...] > > 09/04/2014 17:09:10: PartialSolution (BulkIteration (Bulk Iteration)) (1/1) > switched to FINISHED > 09/04/2014 17:09:10: CoGroup(For fixed p calculates optimal q) (1/1) > switched to FINISHED > 09/04/2014 17:09:10: > > DataSink(hu.sztaki.ilab.cumulonimbus.als_comparison.strato.ColumnOutputFormatStrato@5dcde3f3 > ) > (1/1) switched to RUNNING > 09/04/2014 17:09:10: CoGroup(For fixed q calculates optimal p) (1/1) > switched to FINISHING > 09/04/2014 17:09:10: > > DataSink(hu.sztaki.ilab.cumulonimbus.als_comparison.strato.ColumnOutputFormatStrato@5dcde3f3 > ) > (1/1) switched to FINISHING > 09/04/2014 17:09:11: Join(Sends the columns of q with multiple keys) (1/1) > switched to FINISHED > 09/04/2014 17:09:11: CoGroup(For fixed q calculates optimal p) (1/1) > switched to FINISHED > 09/04/2014 17:09:11: > > DataSink(hu.sztaki.ilab.cumulonimbus.als_comparison.strato.ColumnOutputFormatStrato@5dcde3f3 > ) > (1/1) switched to FINISHED > 09/04/2014 17:09:11: Job execution switched to status FINISHED > > > > > > On Thu, Sep 4, 2014 at 3:33 PM, Ufuk Celebi <[email protected]> wrote: > > > Hey Marton, > > > > thanks for reporting the issue and the link to the repo to reproduce the > > problem. I will look into it later today. > > > > If you like, you could provide some more information in the meantime: > > > > - How the CPU load? > > - What are TM logs saying? > > - Can you give a stack trace? Where is it hanging? > > > > > > > > On Thu, Sep 4, 2014 at 3:14 PM, Márton Balassi <[email protected] > > > > wrote: > > > > > Hey, > > > > > > We managed to produce a code, for which the legacy Stratophere 0.5 > > release > > > implementation works nicely, however the updated Flink 0.6 release > > > implementation hangs up for slightly larger inputs. > > > > > > > > > Please check out the issue here: > > > https://github.com/mbalassi/als-comparison > > > > > > Any suggestions are welcome. > > > > > > Cheers, > > > > > > Marton > > > > > >
