Hi Eddie, Thanks again for your reply.
That is strange that the node memory size showed that much. I only have 16GB of RAM, and I’m running it from a macbook pro/OS X. When I run DUCC, I tried not to run any unnecessary processes and application. I don’t understand how it could be showing that much. Given the weird node memory size, I wonder if it’s a problem with my computer or just an initialization setting. I did use disk utility to check my file system, but it isn’t showing any irregularities. Looking back at the job details page of the ducc-mon, the RSS for the job I posted last time shows that the RSS was 0 for both. Thanks for taking the time to replicate the job I’m running. I don’t want to take up too much of your time, but thank you so much for doing so and much appreciate it. cTAKES can be a bit tricky to configure. I’ve spend some time on it. Let me know if come across problems. Best, Selina On Fri, Mar 11, 2016 at 1:41 PM, Eddie Epstein <eaepst...@gmail.com> wrote: > Selina, > > Thanks for the log files. The agent log shows "Node Memory Total:251392 MB" > which looks like a healthy size. Is this value correct? Just in case, what > OS are you running on? > > Unfortunately the RSS size for the JD and JP processes is not being shown > in the agent log in that version of DUCC. They should be shown on the job > details page of ducc-mon. > > The agent log does confirm that cgroups are not enabled, which should > eliminate the possibility that the JD was swapping. That leaves me puzzled > about JD behavior. > > The need for a JD sending references to input data rather than the data > itself is to try to avoid making the JD a bottleneck when processing is > scaled out. Not yet clear that the current CR is the cause of erratic > behavior. > > I attempted to replicate your job here but got stuck on UMLS authentication > and am now waiting for the approval to use UMLS. > > DUCC's rogue detection is intended for machines that are fully managed by > DUCC. The default properties includes a small number of processes, like ssh > and bash, which are useful to ignore. All UIDs below a specified threshold > are also ignored. Certainly OK to customize to ignore specific users or > process names, but remember that DUCC will attempt to utilize all memory > that was not taken by system users when the agent started. Unexpected > memory use by other processes can lead to over-committing system memory. > > Below are the lines to modify as desired, add to the file > site.ducc.properties, and then restart DUCC. The site file facilitates > migration to new DUCC versions. > > # max UID reserved by OS. This is used to detect rogue processes and to > report > # available memory on a node. > ducc.agent.node.metrics.sys.gid.max=500 > # exclude the following user ids while detecting rogue processes > ducc.agent.rogue.process.user.exclusion.filter= > #exclude the following processes while detecting rogue processes > > ducc.agent.rogue.process.exclusion.filter=sshd:,-bash,-sh,/bin/sh,/bin/bash,grep,ps > > > Regards, > Eddie > > > On Fri, Mar 11, 2016 at 1:38 PM, Selina Chu <selina....@gmail.com> wrote: > > > Hi Eddie, > > > > Thanks for the pointer about not putting the analytic pipeline in the JD > > driver. It seems like we’ve misunderstood the use of it. We’ll look into > > modifying it so that the JD driver contains only the collection reader > > component. Hopefully cTAKES will let us do so. > > > > As suggested, I restarted DUCC and ran the same job once. The agent.log > > file is quite big. So I’ve placed it in a repo, along with others > related > > logs in here: > > https://github.com/selinachu/Templogs/tree/master/NewLogs_Mar11 > > > > I noticed that the agent log indicated many rogue processes. Would it be > > helpful to modify the settings in ducc.properties to clean up these > > processes? > > > > Thanks again for your help. > > > > Cheers, > > Selina > > > > > > On Thu, Mar 10, 2016 at 10:30 AM, Eddie Epstein <eaepst...@gmail.com> > > wrote: > > > > > Hi, > > > > > > DUCC has some logfiles that show more details of the machine and the > job > > > which would allow us to answer your questions about machine physical > > > resources. These are located in $DUCC_HOME/logs, and in particular the > > > agent log would be very helpful. The logfile name is {machine > > > name}.{domain}.agent.log > > > Please restart ducc so we can see the log from agent startup thru > running > > > the job one time. > > > > > > As for the JD memory requirement, the JD driver should not contain any > of > > > the analytic pipeline. Its purpose is normally to send a reference to > the > > > input data to the Job Processes which will read the input data, process > > it > > > and write results. (This is described at > > > http://uima.apache.org/d/uima-ducc-2.0.0/duccbook.html#x1-1600008.1 ) > > > > > > It should be possible for you to take *just* the collection reader > > > component from the cTAKES pipeline and use that for the JobDriver. > > > Hopefully this would need much less than Xmx400, > > > > > > Regards, > > > Eddie > > > > > > > > > On Thu, Mar 10, 2016 at 12:07 PM, Selina Chu <selina....@gmail.com> > > wrote: > > > > > > > Hi Eddie, > > > > > > > > Thanks so much for taking the time to look at my issue and for your > > > reply. > > > > > > > > The reason I had to increase the heap size for the JD is because I'm > > > > running cTAKES (http://ctakes.apache.org/) with DUCC. The increased > > > heap > > > > size is to accommodate loading all the models from cTAKES into > memory. > > > > Before, when I didn't increase the memory size, DUCC would cancel the > > > > driver and ends. cTAKES would return back the error of > > > > "java.lang.OutOfMemoryError: Java heap space”. > > > > > > > > Would you say that this problem is mainly a limitation of my physical > > > > memory and processes that are running on my computer or can it be > > > adjusted > > > > in DUCC, like making parameter adjustments so I can use an increased > > heap > > > > size or maybe a way to pre-allocate enough memory to be used by DUCC? > > > > > > > > Thanks again, > > > > Selina > > > > > > > > > > > > On Wed, Mar 9, 2016 at 7:35 PM, Eddie Epstein <eaepst...@gmail.com> > > > wrote: > > > > > > > > > Hi Selina, > > > > > > > > > > I suspect that the problem is due to the following job parameter: > > > > > driver_jvm_args -Xmx4g > > > > > > > > > > This would certainly be true if cgroups have been configured on for > > > DUCC. > > > > > The default cgroup size for a JD is 450MB, so specifying an Xmx of > > 4GB > > > > can > > > > > cause the JVM to spill into swap space and cause erratic behavior. > > > > > > > > > > Comparing a "fast" job (96) vs "slow" job (97), the time to process > > the > > > > > single work item was 8 sec vs 9 sec: > > > > > 09 Mar 2016 08:46:08,556 INFO JobDriverHelper - T[20] summarize > > > > > workitem statistics [sec] avg=8.14 min=8.14 max=8.14 stddev=.00 > > > > > vs > > > > > 09 Mar 2016 08:56:46,583 INFO JobDriverHelper - T[19] summarize > > > > > workitem statistics [sec] avg=9.41 min=9.41 max=9.41 stddev=.00 > > > > > > > > > > The extra delays between the two jobs appear associated with the > Job > > > > > Driver. > > > > > > > > > > Was there some reason you specified heap size for the JD? The > default > > > JD > > > > > heap size is Xmx400m. > > > > > > > > > > Regards, > > > > > Eddie > > > > > > > > > > > > > > > > > > > > On Wed, Mar 9, 2016 at 2:41 PM, Selina Chu <selina....@gmail.com> > > > wrote: > > > > > > > > > > > Hi > > > > > > > > > > > > I’m kind of new to DUCC and this forum. I was hoping to see if > > > someone > > > > > > could give me some insights as to why DUCC is behaving strangely > > and > > > a > > > > > bit > > > > > > unstable. > > > > > > > > > > > > So what I'm trying to do is: I’m using DUCC to process a cTAKES > > job. > > > > > > Currently DUCC is just using a single node. DUCC seems to act > > > randomly > > > > > in > > > > > > processing the jobs, varying between 4.5 minutes to 23 minutes, > > and I > > > > > > wasn’t running anything else that is CPU intensive. When I don’t > > use > > > > DUCC > > > > > > and use cTAKES alone, the times for processing are pretty > > consistent. > > > > > > > > > > > > To demonstrate this strange behavior in DUCC, I submitted the > exact > > > > same > > > > > > job 10 times in a row (job ID 95-104), without modification to > the > > > > > > settings. > > > > > > The duration for finishing each of the jobs are: 4:41, 4:43, > 12:48, > > > > 8:41, > > > > > > 5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The > > first > > > 9 > > > > > jobs > > > > > > were completed and the last one got canceled. Even before the > last > > > > job, > > > > > > the first 9 jobs were varying in duration times. > > > > > > After restarting DUCC a couple of times and resetting it, I > > submitted > > > > the > > > > > > same job (job ID 110), that job was completed without a problem > > (long > > > > > > processing time) > > > > > > > > > > > > I noticed that when a job takes a long time to finish, past 5 > > > minutes, > > > > it > > > > > > seemed to be stuck at the “initializing” and “completing” states > > for > > > > the > > > > > > longest. > > > > > > > > > > > > It seems like DUCC is doing something randomly. I tried > examining > > > the > > > > > log > > > > > > files, but they are all similar, except for the time between each > > > > state. > > > > > > (I’ve also placed the related logs and job file in a repo > > > > > > https://github.com/selinachu/Templogs, in case anyone is > > interested > > > in > > > > > > examining them.) > > > > > > > > > > > > I’m baffled with the random behaviors from DUCC. I was hoping > maybe > > > > > someone > > > > > > could clarify this more for me. > > > > > > > > > > > > After completing a job, what does DUCC do? Does it save something > > in > > > > > > memory, which carries over to the next job, which probably > relates > > to > > > > the > > > > > > initialization process? Are there some parameter settings that > > might > > > > > > alleviate this type of behavior? > > > > > > > > > > > > I would appreciate any insight. Thanks in advance for your help. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > Selina Chu > > > > > > > > > > > > > > > > > > > > >