Hi,

Thanks for the reply. I'm hearing a few things at once so please let me 
try to tease it apart.

The first is that UIMA is caching feature structure address look ups 
already. My tests show that the time performance of this cache is not as 
good as a Java HashMap. Does the UIMA implementation have benefits I am 
not seeing? Perhaps in memory usage? If those benefits do not apply to my 
use case, is there a way that I can register my own Map implementation?

The second is that caching can hurt performance and so there is an option 
to turn it off. I believe you are saying that if the percentage of cache 
accesses that are misses is above some level then caching is a performance 
drag rather than a performance help. I believe that is generally true of 
caching. For my use case it is not the case though. (By the way, I wasn't 
able to find this option to measure the effect on my tests of turning off 
caching but I couldn't think of good keywords to search on to find it.)

You didn't mention improving feature value access performance. It seems 
that could be improved? 

I'm puzzled by your comment about the likelihood of being able to improve 
UIMA's performance for all or even most scenarios. Is this based on some a 
survey of typical UIMA applications? What was the decision process that 
led to the current map implementation? I just wonder what criteria would 
need to be achieved by a proposed change. My feeling continues to be that 
there is room for improvement.

Thanks,
Mike





From:   Eddie Epstein <[email protected]>
To:     [email protected], 
Date:   06/26/2013 08:51 AM
Subject:        Re: [jira] [Commented] (UIMA-3017) Getting feature value 
from feature structure longer than expected



On Tue, Jun 25, 2013 at 12:17 PM, Michael A Barborak <
[email protected]> wrote:

> Hi,
>
> I added an access counter to MyClass that I incremented with each call 
to
> getMemberStringValue(). I then printed that access count at the end of
> each round. This was my result:
>
> Test 4. Get a member string from a POJO
> 100000000
> round 0 total time 4: 0.109360998s
> 200000000
> round 1 total time 4: 0.103437715s
> 300000000
> round 2 total time 4: 0.103085417s
> 400000000
> round 3 total time 4: 0.102891805s
> 500000000
> round 4 total time 4: 0.101912657s
>
> Slower but in the ballpark. This seems to indicate to me that the code
> isn't being optimized away.
>

Thanks for the confirmation.


>
> I guess I don't understand how the principles you mention must incur the
> measured performance penalties. Or maybe you're not saying that? In
>
Actually, yes I am saying that these layers will incur some performance
penalty.
I don't how much the penalty for the current code can be reduced for all 
or
even
most scenarios.


> particular it seems something like what I did for test 6 could be
> implemented within UIMA rather than on top of it. I would suppose that
> some optimization within UIMA might improve retrieving strings too. In 
any
> case, I've deployed such optimizations in our application and seen a
> performance improvement.
>
The JCas hashmap can have significant memory overhead, and for some
applications where most FS are never retrieved within the same process
the hashmap actually has significant CPU overhead as well. That's why
the JCas hashmap can be turned off, or maybe it is turned off by default
and can be turned on, don't remember :)

Eddie

Reply via email to