Hi Joseph,

I am not sure that this indicates a bug in the EventJobManager. It
could as well be the case that the requests are simple timing out
because the chain does not finish within the 60sec.

Possible reasons could be:

* Entityhub linking on slow HDD (e.g. a laptop without SSD) can be
slow. Especially if "Proper Noun Linking" is deactivated (meaning that
all Nouns are marches with the Vocabulary) processing large documents
will be time consuming. Having a lot of concurrent requests will
increase the processing time additionally (as HDD IO is limited and
does not scale with concurrent requests).
* As ContentItems are kept in-memory heap size may also be the cause
of the issue. Having concurrent requests with large documents will
require additional memory. If Stanbol runs into low memory situations
processing times can dramatically increase.

I would suggest to:

1. try to increase the heap (-Xmx parameter)
2. try to configure a chain without EntityLinking (e.g. langdetect
plus the openNLP engines) to check if the EventJobManager
implementation is the cause of your problem.

best
Rupert


On Tue, Oct 29, 2013 at 3:39 PM, Joseph M'Bimbi-Bene
<jbi...@object-ive.com> wrote:
> Another interesting fact is that looking at the "monitor usage", most of
> the blocker threads (50% of the time) have the following stack trace :
>   -java.util.Currency.getInstance(String, int, int)
>
> -org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run()
>
>       -EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run()
>         -java.lang.Thread.run()
>
> I have the version 1.2.14 of felix.eventAdmin but in the source code, there
> is no call to Currency.getInstance.
>
>
> On Tue, Oct 29, 2013 at 3:30 PM, Joseph M'Bimbi-Bene
> <jbi...@object-ive.com>wrote:
>
>> Hello everybody,
>>
>> I'm having a problem with Stanbol trying to enhance a lot of somewhat
>> "large" documents (40000 to 60000 characters).
>>
>> Depending on the enhancement chain i use, i get a timeouts earlier or
>> later. The timeouts is configured by default
>> (langdetect + token + pos + sentence + dbPedia) = timeout after like the
>> 10th enhancement request.
>> (langdetect + token + dbPedia) = timeout after 10 min. something like that.
>>
>> I monitored Stanbol in the first case (langdetect + token + pos + sentence
>> + dbPedia) with Yourkit Java Profiler.
>>
>> I noticed that CPU wise, the hotspots are
>>   -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
>> double)  with 11% of the time spent.
>>   -opennlp.tools.util.Sequence.<init>(Sequence, String, double) 2%.
>>
>> Memory wise, the hotspots are:
>>   -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
>> double)  with 12% of space taken.
>>
>> I modified the following parameters inf the
>> {stanbol-working-dir}\stanbol\config\org\apache\felix\eventadmin\impl\EventAdmin.config
>> file.
>> org.apache.felix.eventadmin.ThreadPoolSize="100"
>> org.apache.felix.eventadmin.CacheSize="2048"
>>
>> I kinda felt that it would delay the timeouts.
>>
>> Anyway, I noticed that there would be A LOT of threads being created, then
>> immidiately going to "waiting" state, then dying after 60 seconds, exactly
>> the "stanbol.maxEnhancementJobWaitTime" parameter.
>>
>> What other information can i provide ?
>>



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to