Re: Per-repository thread pool in Jackrabbit

Marcel Reutegger Mon, 13 Jul 2009 00:13:22 -0700

Hi,

2009/7/12 Jukka Zitting <[email protected]>:
> Hi,
>
> 2009/7/8 Marcel Reutegger <[email protected]>:
>> - paralleled execution of some work. this is primarily to make use of
>> multi-core processors. execution should be distributed over and
>> executed by N threads which is a factor of the available processors.
>
> If I recall correctly we debated this already earlier. My point was
> that limiting the number of tasks to the number of available
> processors may not be a good approach as the tasks may be IO-bound or
> block for other reasons, in which case having more task threads would
> give you better throughput. But I recall being proven wrong, did we
> have some benchmark for that? Do you remember where this discussion
> was?


I don't remember either... But let's just start a new one.

I think this very much depends on the work that needs to be distributed. there
is no prove that one way is better than the other. for CPU intensive work we'd
probably want to limit the number of concurrent tasks. for I/O intensive work
the concurrency should be higher.

my above point was rather related to CPU intensive work. e.g. creating a posting
list while content is indexed. but of course there might be other work that may
be parallelized more aggressively.

I guess the actual pool shouldn't care about that. some utility on top
of the pool
should provide that functionality. i.e. execute a number of tasks with a given
level of concurrency. the utility would then dispatch the tasks to the pool
accordingly.

>> - Timers used in TransactionContext and MultiIndex. This could be
>> turned into a scheduling mechanism that could also be used by the
>> ClusterNode sync. Other classes that use periodic checks in a
>> background thread: DatabaseJournal (ClusterRevisionJanitor),
>> CooperativeFileLock (watch dog).
>
> Yep. Perhaps we could also reuse some of the scheduling functionality in 
> Sling.

I'm not sure this is needed. the java rt library already comes with
Timer and Task
classes. our needs are very simple and I'm not sure that justifies a
new dependency.

>> the more I think about it, the more I like your idea. but we should be
>> careful with a maximum size for a repository wide pool. extensive use
>> of the pool by a module should not lock up another module just because
>> there are no more idle threads. maybe that global pool shouldn't have
>> a maximum size...
>
> That might make sense. Perhaps we should have some concept of
> sub-pools (that borrow from the main pool) with fixed limits for tasks
> that need them (see above).

hmm, that doesn't sound flexible and generic. I just thought again how cool
it was if we could deploy jackrabbit into a google app-engine. that however
requires that all background threads are removed. if we have that generic
pool and client code adjusted accordingly it could be as easy as turning
the pool into a direct executor variant ;) well, that's very optimistic but
sounds promising to me...

regards
 marcel

Re: Per-repository thread pool in Jackrabbit

Reply via email to