Re: Flat index and Ruta

Peter Klügl Fri, 26 Jun 2015 07:34:35 -0700

Thanks Marshall :-)


Am 26.06.2015 um 16:27 schrieb Marshall Schor:
> Thanks, Peter, for measuring :-)
>
> The flattening has a performance negative - the time to create the flattened
> version of the index.  After that is created, then iterating is faster.
>
> Some observations:
>
> 1) It is only applied to "sorted" indexes (like the Annotation index).
>
> 2) It is not done immediately; it's done once some internal measurements
> determines that a large number of calls to the iterator sub-methods that 
> manage
> "merging" of multiple types / subtypes is happening.  If you iterate over an
> type which has no subtypes, no flattening is needed (and it's not done for 
> that
> case).
>
> 3) The time to flatten is the time to make one iteration through all of the 
> type
> and its subtypes.  In your test, the output is saying that 2 indexes were
> flattened, and the total time to do both was 27 milli-seconds.
>
> 4) After an index is flattened, it is used, hopefully many times, before the 
> run
> ends, or before something updates the index for this type (or any of its
> subtypes).  The readout below shows that there were 29,813 uses of the 
> flattened
> index.  The 0 discards means that after the flattened index was created, there
> was no subsequent updates to the indexes that invalidated the index.
>
> 5) The amount of improvement varies by how much work is involved in managing 
> the
> merging of a type and all of its subtypes.  This work is dependent on the 
> number
> of subtypes, among other factors.  One motivating case had close to a 1000
> subtypes in one example.  Furthermore, the amount of improvement also is a
> function of the ratio of the amount of time spent doing the analysis compared 
> to
> the amount of time spent managing the iterator.  Typically, UIMA annotators 
> are
> pretty CPU intensive, so even if you reduced the iteration overhead to 0 you
> might not see much change.
>
> -Marshall 
>
> On 6/26/2015 10:04 AM, Peter Klügl wrote:
>> Hi,
>>
>> I finally did some performance testing concerning ruta and the new flat
>> index stuff. Unfortunately, there is hardly any performance difference.
>>
>> Marshall can you help me interpret the outptut?
>> Time to flatten was 27,619 microseconds
>> Flatten tuning, threshold: 50, creations: 2 uses: 29813, discards: 0
>>
>> The test bed consists of the rules of example-projects/GermanNovels
>> applied on "The Idiot" (about 300,000 tokens). Even if I only apply a
>> rule like "ANY;" (the type ANY has 19 subtypes) there is hardly any
>> difference.
>>
>> Best,
>>
>> Peter
>>
>>
>>
>>

Re: Flat index and Ruta

Reply via email to