Re: Random processing time with DUCC

Selina Chu Mon, 14 Mar 2016 09:38:33 -0700

Hi Eddie,

Thanks again for your reply.


That is strange that the node memory size showed that much.
I only have 16GB of RAM, and I’m running it from a macbook pro/OS X.  When
I run DUCC, I tried not to run any unnecessary processes and application. I
don’t understand how it could be showing that much.

Given the weird node memory size, I wonder if it’s a problem with my
computer or just an initialization setting. I did use disk utility to check
my file system, but it isn’t showing any irregularities.

Looking back at the job details page of the ducc-mon, the RSS for the job I
posted last time shows that the RSS was 0 for both.

Thanks for taking the time to replicate the job I’m running.  I don’t want
to take up too much of your time, but thank you so much for doing so and
much appreciate it.  cTAKES can be a bit tricky to configure. I’ve spend
some time on it. Let me know if come across problems.

Best,
Selina


On Fri, Mar 11, 2016 at 1:41 PM, Eddie Epstein <eaepst...@gmail.com> wrote:

> Selina,
>
> Thanks for the log files. The agent log shows "Node Memory Total:251392 MB"
> which looks like a healthy size. Is this value correct? Just in case, what
> OS are you running on?
>
> Unfortunately the RSS size for the JD and JP processes is not being shown
> in the agent log in that version of DUCC. They should be shown on the job
> details page of ducc-mon.
>
> The agent log does confirm that cgroups are not enabled, which should
> eliminate the possibility that the JD was swapping. That leaves me puzzled
> about JD behavior.
>
> The need for a JD sending references to input data rather than the data
> itself is to try to avoid making the JD a bottleneck when processing is
> scaled out. Not yet clear that the current CR is the cause of erratic
> behavior.
>
> I attempted to replicate your job here but got stuck on UMLS authentication
> and am now waiting for the approval to use UMLS.
>
> DUCC's rogue detection is intended for machines that are fully managed by
> DUCC. The default properties includes a small number of processes, like ssh
> and bash, which are useful to ignore. All UIDs below a specified threshold
> are also ignored. Certainly OK to customize to ignore specific users or
> process names, but remember that DUCC will attempt to utilize all memory
> that was not taken by system users when the agent started. Unexpected
> memory use by other processes can lead to over-committing system memory.
>
> Below are the lines to modify as desired, add to the file
> site.ducc.properties, and then restart DUCC. The site file facilitates
> migration to new DUCC versions.
>
> # max UID reserved by OS. This is used to detect rogue processes and to
> report
> # available memory on a node.
> ducc.agent.node.metrics.sys.gid.max=500
> # exclude the following user ids while detecting rogue processes
> ducc.agent.rogue.process.user.exclusion.filter=
> #exclude the following processes while detecting rogue processes
>
> ducc.agent.rogue.process.exclusion.filter=sshd:,-bash,-sh,/bin/sh,/bin/bash,grep,ps
>
>
> Regards,
> Eddie
>
>
> On Fri, Mar 11, 2016 at 1:38 PM, Selina Chu <selina....@gmail.com> wrote:
>
> > Hi Eddie,
> >
> > Thanks for the pointer about not putting the analytic pipeline in the JD
> > driver.  It seems like we’ve misunderstood the use of it. We’ll look into
> > modifying it so that the JD driver contains only the collection reader
> > component. Hopefully cTAKES will let us do so.
> >
> > As suggested, I restarted DUCC and ran the same job once.  The agent.log
> > file is quite big.  So I’ve placed it in a repo, along with others
> related
> > logs in here:
> > https://github.com/selinachu/Templogs/tree/master/NewLogs_Mar11
> >
> > I noticed that the agent log indicated many rogue processes.  Would it be
> > helpful to modify the settings in ducc.properties to clean up these
> > processes?
> >
> > Thanks again for your help.
> >
> > Cheers,
> > Selina
> >
> >
> > On Thu, Mar 10, 2016 at 10:30 AM, Eddie Epstein <eaepst...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > DUCC has some logfiles that show more details of the machine and the
> job
> > > which would allow us to answer your questions about machine physical
> > > resources. These are located in $DUCC_HOME/logs, and in particular the
> > > agent log would be very helpful. The logfile name is {machine
> > > name}.{domain}.agent.log
> > > Please restart ducc so we can see the log from agent startup thru
> running
> > > the job one time.
> > >
> > > As for the JD memory requirement, the JD driver should not contain any
> of
> > > the analytic pipeline. Its purpose is normally to send a reference to
> the
> > > input data to the Job Processes which will read the input data, process
> > it
> > > and write results. (This is described at
> > > http://uima.apache.org/d/uima-ducc-2.0.0/duccbook.html#x1-1600008.1 )
> > >
> > > It should be possible for you to take *just* the collection reader
> > > component from the cTAKES pipeline and use that for the JobDriver.
> > > Hopefully this would need much less than Xmx400,
> > >
> > > Regards,
> > > Eddie
> > >
> > >
> > > On Thu, Mar 10, 2016 at 12:07 PM, Selina Chu <selina....@gmail.com>
> > wrote:
> > >
> > > > Hi Eddie,
> > > >
> > > > Thanks so much for taking the time to look at my issue and for your
> > > reply.
> > > >
> > > > The reason I had to increase the heap size for the JD is because I'm
> > > > running cTAKES (http://ctakes.apache.org/) with DUCC.  The increased
> > > heap
> > > > size is to accommodate loading all the models from cTAKES into
> memory.
> > > > Before, when I didn't increase the memory size, DUCC would cancel the
> > > > driver and ends.  cTAKES would return back the error of
> > > > "java.lang.OutOfMemoryError: Java heap space”.
> > > >
> > > > Would you say that this problem is mainly a limitation of my physical
> > > > memory and processes that are running on my computer or can it be
> > > adjusted
> > > > in DUCC, like making parameter adjustments so I can use an increased
> > heap
> > > > size or maybe a way to pre-allocate enough memory to be used by DUCC?
> > > >
> > > > Thanks again,
> > > > Selina
> > > >
> > > >
> > > > On Wed, Mar 9, 2016 at 7:35 PM, Eddie Epstein <eaepst...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Selina,
> > > > >
> > > > > I suspect that the problem is due to the following job parameter:
> > > > >       driver_jvm_args                -Xmx4g
> > > > >
> > > > > This would certainly be true if cgroups have been configured on for
> > > DUCC.
> > > > > The default cgroup size for a JD is 450MB, so specifying an Xmx of
> > 4GB
> > > > can
> > > > > cause the JVM to spill into swap space and cause erratic behavior.
> > > > >
> > > > > Comparing a "fast" job (96) vs "slow" job (97), the time to process
> > the
> > > > > single work item was 8 sec vs 9 sec:
> > > > >    09 Mar 2016 08:46:08,556  INFO JobDriverHelper - T[20] summarize
> > > > > workitem  statistics  [sec]  avg=8.14 min=8.14 max=8.14 stddev=.00
> > > > > vs
> > > > >    09 Mar 2016 08:56:46,583  INFO JobDriverHelper - T[19] summarize
> > > > > workitem  statistics  [sec]  avg=9.41 min=9.41 max=9.41 stddev=.00
> > > > >
> > > > > The extra delays between the two jobs appear associated with the
> Job
> > > > > Driver.
> > > > >
> > > > > Was there some reason you specified heap size for the JD? The
> default
> > > JD
> > > > > heap size is Xmx400m.
> > > > >
> > > > > Regards,
> > > > > Eddie
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 9, 2016 at 2:41 PM, Selina Chu <selina....@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > I’m kind of new to DUCC and this forum.  I was hoping to see if
> > > someone
> > > > > > could give me some insights as to why DUCC is behaving strangely
> > and
> > > a
> > > > > bit
> > > > > > unstable.
> > > > > >
> > > > > > So what I'm trying to do is: I’m using DUCC to process a cTAKES
> > job.
> > > > > > Currently DUCC is just using a single node.  DUCC seems to act
> > > randomly
> > > > > in
> > > > > > processing the jobs, varying between 4.5 minutes to 23 minutes,
> > and I
> > > > > > wasn’t running anything else that is CPU intensive. When I don’t
> > use
> > > > DUCC
> > > > > > and use cTAKES alone, the times for processing are pretty
> > consistent.
> > > > > >
> > > > > > To demonstrate this strange behavior in DUCC, I submitted the
> exact
> > > > same
> > > > > > job 10 times in a row (job ID 95-104), without modification to
> the
> > > > > > settings.
> > > > > > The duration for finishing each of the jobs are: 4:41, 4:43,
> 12:48,
> > > > 8:41,
> > > > > > 5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The
> > first
> > > 9
> > > > > jobs
> > > > > > were completed and the last one got canceled.  Even before the
> last
> > > > job,
> > > > > > the first 9 jobs were varying in duration times.
> > > > > > After restarting DUCC a couple of times and resetting it, I
> > submitted
> > > > the
> > > > > > same job (job ID 110), that job was completed without a problem
> > (long
> > > > > > processing time)
> > > > > >
> > > > > > I noticed that when a job takes a long time to finish, past 5
> > > minutes,
> > > > it
> > > > > > seemed to be stuck at the “initializing” and “completing” states
> > for
> > > > the
> > > > > > longest.
> > > > > >
> > > > > > It seems like DUCC is doing something randomly.  I tried
> examining
> > > the
> > > > > log
> > > > > > files, but they are all similar, except for the time between each
> > > > state.
> > > > > > (I’ve also placed the related logs and job file in a repo
> > > > > > https://github.com/selinachu/Templogs, in case anyone is
> > interested
> > > in
> > > > > > examining them.)
> > > > > >
> > > > > > I’m baffled with the random behaviors from DUCC. I was hoping
> maybe
> > > > > someone
> > > > > > could clarify this more for me.
> > > > > >
> > > > > > After completing a job, what does DUCC do? Does it save something
> > in
> > > > > > memory, which carries over to the next job, which probably
> relates
> > to
> > > > the
> > > > > > initialization process?  Are there some parameter settings that
> > might
> > > > > > alleviate this type of behavior?
> > > > > >
> > > > > > I would appreciate any insight.  Thanks in advance for your help.
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Selina Chu
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Random processing time with DUCC

Reply via email to