I have now tried the suggestion of my previous append. My first recommendation is that one should not change these values (especially for a production system) away from the defaults, as some of my co-committers have reminded me offline.
On my *test* system I did modify site.ducc.properites and the system ran 1.job just fine. I did not examine resource consumption (CPU), though I sure it had to be higher to support the increased communications and scheduling overhead. And remember that 1.job is a "fake" job - the work items only sleep so there is no competition for CPU. Also, on the System Daemons page, the ResourceManager showed as "down" every once-in-a-while (even though it was really up) because its minimum publish rate is 5 seconds. My second recommendation is to do one of the following instead: 1. submit Jobs with more than 1 work item 2. re-imagine your Job as a Service 3. Use all-in-one local Lou. On Mon, Nov 30, 2015 at 9:21 AM, Lou DeGenaro <[email protected]> wrote: > Yi-Wen, > > The latency you are experiencing is by-design for a large-ish computing > cluster. The normal life-cycle for a Job is: > > Received WaitingForDriver WaitingForResources > Assigned > Initializing > Running > Completing > Completed > > There are some knobs you can turn to tune for your situation. > > 1. DUCC intra-daemon communications - states affected: All > > DUCC is implemented as a small collection of daemons that communicate > with each other at discrete publishing intervals. The publishing intervals > are configured in $DUCC_HOME/resources/ducc.properties. The default > interval values are on the order of 15-60 seconds. At the cost of more > chatter between daemons on the network, you can try lowering some of these > values. > > These times are the current default ones and are specified in milliseconds: > > ducc.jd.state.publish.rate=15000 > ducc.orchestrator.state.publish.rate=10000 > ducc.pm.state.publish.rate=15000 > > I have not tried this myself, but perhaps try lowering them to: > > ducc.jd.state.publish.rate=2000 > ducc.orchestrator.state.publish.rate=1000 > ducc.pm.state.publish.rate=1000 > > 2. DUCC scheduling - state affected: WaitingForResources > > The DUCC scheduler does not do continuous resource management, but rather > calculates a desired allocation at discrete intervals. After each > scheduling cycle, the scheduler publishes its layout for the other daemons > to implement. By default, the scheduler is doing this calculation and > publication whenever it receives an orchestrator.state publication: > > ducc.rm.state.publish.ratio = 1 > > This seems fine as is. > > 3. DUCC deployment of Job - states affected: WaitingForDriver, > Initializing > > Once a Job is accepted, the Job Driver [your CollectionReader] and one or > more Job Processes [your AnlaysisEngine] must be launched. > > The partial sequence of states here are: > > WaitingForDriver: The Job Driver is launched, and not until it reports > that is is ready to produce work items will the next state > (WaitingForResources) occur > ... > Initializing: A Job Process is launched, and not until it has completed > initialization of all threads will it ask the Job Driver for the first work > item > Running: The first work item has been dispatched > > Minimizing the time for your CR to initialize will help make the > transition from WaitingForDriver to WaitingForResources faster. > Minimizing the time for your AE to initialize will help make the > transition from Initializing to Running faster. > > Hope this helps. > > Lou. > > On Sun, Nov 29, 2015 at 11:25 PM, Yi-Wen Liu <[email protected]> wrote: > >> Hi, >> >> Thanks for the reply, and yes, I only have a single work item. >> >> Thanks, >> Yi-Wen >> >> On Sun, Nov 29, 2015 at 7:45 PM, Eddie Epstein <[email protected]> >> wrote: >> >> > Hi, >> > >> > Yes, there are some site.ducc.property entries that will speed up the >> > timing. Will respond with those tomorrow. >> > Are you often running jobs with only a single work item? >> > >> > Eddie >> > >> > On Sat, Nov 28, 2015 at 7:23 PM, Yi-Wen Liu <[email protected]> wrote: >> > >> > > Hi, >> > > >> > > I am using ducc to process text files(cTAKES), and one of my input is >> > quite >> > > short, about 10 lines. >> > > But it takes more than two minutes to process it, as follows: >> > > After submitting, >> > > 00:00-00:08 > no status >> > > 00:09-00:30 > waiting for driver >> > > 00:31-01:00 > waiting for resources >> > > 01:01-02:00 > initializing >> > > 02:01-02:30 > completing >> > > 02:31 > completed >> > > >> > > Is there any way to lower the preprocessing time?(Time to wait for >> > driver, >> > > resources, initializing...) >> > > >> > > I am wondering why it takes so long before completing, and have tried >> > > different parameter values, for example lower initialization time, >> lower >> > > resources needed, but didn't have much improvement. >> > > >> > > Here's parameters I am using now: process_memory_size 2 >> > > process_jvm_args -Xmx4g >> > > driver_jvm_args -Xmx4g >> > > process_thread_count 2 >> > > process_per_item_time_max 5 >> > > process_deployments_max 999 >> > > environment AE_INIT_TIME=5 AE_INIT_RANGE=5 INIT_ERROR=0 >> > > >> > > Any suggestion is appreciated. >> > > >> > > Thanks, >> > > Yi-Wen >> > > >> > >> > >
