I have now tried the suggestion of my previous append.

My first recommendation is that one should not change these values
(especially for a production system) away from the defaults, as some of my
co-committers have reminded me offline.

On my *test* system I did modify site.ducc.properites and the system ran
1.job just fine.  I did not examine resource consumption (CPU), though I
sure it had to be higher to support the increased communications and
scheduling overhead.  And remember that 1.job is a "fake" job - the work
items only sleep so there is no competition for CPU.  Also, on the System
Daemons page, the ResourceManager showed as "down" every once-in-a-while
(even though it was really up) because its minimum publish rate is 5
seconds.

My second recommendation is to do one of the following instead:

1. submit Jobs with more than 1 work item
2. re-imagine your Job as a Service
3. Use all-in-one local

Lou.



On Mon, Nov 30, 2015 at 9:21 AM, Lou DeGenaro <[email protected]>
wrote:

> Yi-Wen,
>
> The latency you are experiencing is by-design for a large-ish computing
> cluster.  The normal life-cycle for a Job is:
>
> Received WaitingForDriver WaitingForResources
> Assigned
> Initializing
> Running
> Completing
> Completed
>
> There are some knobs you can turn to tune for your situation.
>
> 1. DUCC intra-daemon communications - states affected: All
>
> DUCC is implemented as a small collection of daemons that communicate
> with each other at discrete publishing intervals.  The publishing intervals
> are configured in $DUCC_HOME/resources/ducc.properties.  The default
> interval values are on the order of 15-60 seconds.  At the cost of more
> chatter between daemons on the network, you can try lowering some of these
> values.
>
> These times are the current default ones and are specified in milliseconds:
>
> ducc.jd.state.publish.rate=15000
> ducc.orchestrator.state.publish.rate=10000
> ducc.pm.state.publish.rate=15000
>
> I have not tried this myself, but perhaps try lowering them to:
>
> ducc.jd.state.publish.rate=2000
> ducc.orchestrator.state.publish.rate=1000
> ducc.pm.state.publish.rate=1000
>
> 2. DUCC scheduling - state affected: WaitingForResources
>
> The DUCC scheduler does not do continuous resource management, but rather
> calculates a desired allocation at discrete intervals.  After each
> scheduling cycle, the scheduler publishes its layout for the other daemons
> to implement.  By default, the scheduler is doing this calculation and
> publication whenever it receives an orchestrator.state publication:
>
> ducc.rm.state.publish.ratio = 1
>
> This seems fine as is.
>
> 3. DUCC deployment of Job - states affected: WaitingForDriver,
> Initializing
>
> Once a Job is accepted, the Job Driver [your CollectionReader] and one or
> more Job Processes [your AnlaysisEngine] must be launched.
>
> The partial sequence of states here are:
>
> WaitingForDriver: The Job Driver is launched, and not until it reports
> that is is ready to produce work items will the next state
> (WaitingForResources) occur
> ...
> Initializing: A Job Process is launched, and not until it has completed
> initialization of all threads will it ask the Job Driver for the first work
> item
> Running: The first work item has been dispatched
>
> Minimizing the time for your CR to initialize will help make the
> transition from WaitingForDriver to WaitingForResources faster.
> Minimizing the time for your AE to initialize will help make the
> transition from Initializing to Running faster.
>
> Hope this helps.
>
> Lou.
>
> On Sun, Nov 29, 2015 at 11:25 PM, Yi-Wen Liu <[email protected]> wrote:
>
>> Hi,
>>
>> Thanks for the reply, and yes, I only have a single work item.
>>
>> Thanks,
>> Yi-Wen
>>
>> On Sun, Nov 29, 2015 at 7:45 PM, Eddie Epstein <[email protected]>
>> wrote:
>>
>> > Hi,
>> >
>> > Yes, there are some site.ducc.property entries that will speed up the
>> > timing. Will respond with those tomorrow.
>> > Are you often running jobs with only a single work item?
>> >
>> > Eddie
>> >
>> > On Sat, Nov 28, 2015 at 7:23 PM, Yi-Wen Liu <[email protected]> wrote:
>> >
>> > > Hi,
>> > >
>> > > I am using ducc to process text files(cTAKES), and one of my input is
>> > quite
>> > > short, about 10 lines.
>> > > But it takes more than two minutes to process it, as follows:
>> > > After submitting,
>> > > 00:00-00:08 > no status
>> > > 00:09-00:30 > waiting for driver
>> > > 00:31-01:00 > waiting for resources
>> > > 01:01-02:00 > initializing
>> > > 02:01-02:30 > completing
>> > > 02:31 > completed
>> > >
>> > > Is there any way to lower the preprocessing time?(Time to wait for
>> > driver,
>> > > resources, initializing...)
>> > >
>> > > I am wondering why it takes so long before completing, and have tried
>> > > different parameter values, for example lower initialization time,
>> lower
>> > > resources needed, but didn't have much improvement.
>> > >
>> > > Here's parameters I am using now: process_memory_size 2
>> > > process_jvm_args -Xmx4g
>> > > driver_jvm_args -Xmx4g
>> > > process_thread_count 2
>> > > process_per_item_time_max 5
>> > > process_deployments_max 999
>> > > environment AE_INIT_TIME=5 AE_INIT_RANGE=5 INIT_ERROR=0
>> > >
>> > > Any suggestion is appreciated.
>> > >
>> > > Thanks,
>> > > Yi-Wen
>> > >
>> >
>>
>
>

Reply via email to