Thanks Marshall :-)
Am 26.06.2015 um 16:27 schrieb Marshall Schor: > Thanks, Peter, for measuring :-) > > The flattening has a performance negative - the time to create the flattened > version of the index. After that is created, then iterating is faster. > > Some observations: > > 1) It is only applied to "sorted" indexes (like the Annotation index). > > 2) It is not done immediately; it's done once some internal measurements > determines that a large number of calls to the iterator sub-methods that > manage > "merging" of multiple types / subtypes is happening. If you iterate over an > type which has no subtypes, no flattening is needed (and it's not done for > that > case). > > 3) The time to flatten is the time to make one iteration through all of the > type > and its subtypes. In your test, the output is saying that 2 indexes were > flattened, and the total time to do both was 27 milli-seconds. > > 4) After an index is flattened, it is used, hopefully many times, before the > run > ends, or before something updates the index for this type (or any of its > subtypes). The readout below shows that there were 29,813 uses of the > flattened > index. The 0 discards means that after the flattened index was created, there > was no subsequent updates to the indexes that invalidated the index. > > 5) The amount of improvement varies by how much work is involved in managing > the > merging of a type and all of its subtypes. This work is dependent on the > number > of subtypes, among other factors. One motivating case had close to a 1000 > subtypes in one example. Furthermore, the amount of improvement also is a > function of the ratio of the amount of time spent doing the analysis compared > to > the amount of time spent managing the iterator. Typically, UIMA annotators > are > pretty CPU intensive, so even if you reduced the iteration overhead to 0 you > might not see much change. > > -Marshall > > On 6/26/2015 10:04 AM, Peter Klügl wrote: >> Hi, >> >> I finally did some performance testing concerning ruta and the new flat >> index stuff. Unfortunately, there is hardly any performance difference. >> >> Marshall can you help me interpret the outptut? >> Time to flatten was 27,619 microseconds >> Flatten tuning, threshold: 50, creations: 2 uses: 29813, discards: 0 >> >> The test bed consists of the rules of example-projects/GermanNovels >> applied on "The Idiot" (about 300,000 tokens). Even if I only apply a >> rule like "ANY;" (the type ANY has 19 subtypes) there is hardly any >> difference. >> >> Best, >> >> Peter >> >> >> >>
