Thanks, Peter, for measuring :-)

The flattening has a performance negative - the time to create the flattened
version of the index.  After that is created, then iterating is faster.

Some observations:

1) It is only applied to "sorted" indexes (like the Annotation index).

2) It is not done immediately; it's done once some internal measurements
determines that a large number of calls to the iterator sub-methods that manage
"merging" of multiple types / subtypes is happening.  If you iterate over an
type which has no subtypes, no flattening is needed (and it's not done for that
case).

3) The time to flatten is the time to make one iteration through all of the type
and its subtypes.  In your test, the output is saying that 2 indexes were
flattened, and the total time to do both was 27 milli-seconds.

4) After an index is flattened, it is used, hopefully many times, before the run
ends, or before something updates the index for this type (or any of its
subtypes).  The readout below shows that there were 29,813 uses of the flattened
index.  The 0 discards means that after the flattened index was created, there
was no subsequent updates to the indexes that invalidated the index.

5) The amount of improvement varies by how much work is involved in managing the
merging of a type and all of its subtypes.  This work is dependent on the number
of subtypes, among other factors.  One motivating case had close to a 1000
subtypes in one example.  Furthermore, the amount of improvement also is a
function of the ratio of the amount of time spent doing the analysis compared to
the amount of time spent managing the iterator.  Typically, UIMA annotators are
pretty CPU intensive, so even if you reduced the iteration overhead to 0 you
might not see much change.

-Marshall 

On 6/26/2015 10:04 AM, Peter Klügl wrote:
> Hi,
>
> I finally did some performance testing concerning ruta and the new flat
> index stuff. Unfortunately, there is hardly any performance difference.
>
> Marshall can you help me interpret the outptut?
> Time to flatten was 27,619 microseconds
> Flatten tuning, threshold: 50, creations: 2 uses: 29813, discards: 0
>
> The test bed consists of the rules of example-projects/GermanNovels
> applied on "The Idiot" (about 300,000 tokens). Even if I only apply a
> rule like "ANY;" (the type ANY has 19 subtypes) there is hardly any
> difference.
>
> Best,
>
> Peter
>
>
>
>

Reply via email to