If you can attach Yourkit to the node it might reveal something more about what's going on. There's not one node running the NC+CC right?
On Thu, Dec 1, 2016 at 10:14 PM, Taewoo Kim <[email protected]> wrote: > PS: It took 2 more hours to finish the job on one NC. I wonder why this > happens. > > Dec 01, 2016 7:19:35 PM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: NotifyTaskComplete > Dec 01, 2016 9:11:23 PM > org.apache.hyracks.control.common.work.WorkQueue$WorkerThread run > INFO: Executing: CleanupJoblet > Dec 01, 2016 9:11:23 PM > org.apache.hyracks.control.nc.work.CleanupJobletWork run > INFO: Cleaning up after job: JID:4 > Dec 01, 2016 9:11:23 PM org.apache.hyracks.control.nc.Joblet close > WARNING: Freeing leaked 54919521 bytes > > Best, > Taewoo > > On Thu, Dec 1, 2016 at 8:39 PM, Taewoo Kim <[email protected]> wrote: > > > Hi All, > > > > Have you experienced this case? > > > > I have 9 NCs and the CPU utilization of one NC shows 100% for 1 hour and > > 30 minutes while other NCs have finished their job about 1 hour ago. Even > > the problematic NC shows the following log at the end. So, looks like > it's > > done but I'm not sure why this job never finishes. It's a simple hash > join > > for 9M records on 9 nodes. > > > > Dec 01, 2016 7:18:02 PM org.apache.hyracks.control. > common.work.WorkQueue$WorkerThread > > run > > INFO: Executing: NotifyTaskComplete > > > > Best, > > Taewoo > > >
