Hi!
Thanks a lot for your help. I will try both of you suggestions (taxo
index and per-segment ord ranges).
Thanks for clarifying, that I have to iterate the ords. I wasn't sure,
if I didn't just overlook something obvious. Like some way to do an
advanceExact on ords.
Regards
harry
On 01.07.22 18:58, Greg Miller wrote:
To address the last topic (building up ordinal ranges per-segment),
what I'm thinking is that you'd iterate all unique ordinals in the
SSDV field and "memorize" the ordinal range for each dimension
up-front, but on a per-segment basis. This would be very similar to
what DefaultSortedSetDocValuesReaderState#createOneFlatFacetDimState
is doing, but you'd do it per-segment. That per-segment state
information would only need to be done once, since segments are
immutable. Then, when you're iterating all your hits within each
segment, you can determine whether-or-not each stored doc value for
the doc is within the dim range using your segment-level ordinal range
information, avoiding the need to do the global mapping. You can
short-circuit the document-level ordinal iteration once an ordinal
goes beyond the range since the ordinals are stored in sorted order,
but in general, you're still doing a linear operation per document. No
way to avoid that. But you can also resolve the BytesRef within the
context of each segment, completely avoiding the need for a global
mapping.
Not sure this will result in any major performance improvements for
you, but it does let you avoid building a global ordinal map and doing
map lookups within the tight loop.
Cheers,
-Greg
On Fri, Jul 1, 2022 at 2:35 AM Harald Braumann <braum...@m2n.at> wrote:
Hi!
On 01.07.22 00:46, Greg Miller wrote:
Have you considered taxonomy faceting for your use-case? Because the
taxonomy structure is maintained in a separate index, it's
(relatively) trivial to iterate all direct child ordinals of a given
dimension. The cost of mapping to a global ordinal space is done when
the index is merged.
Thanks for the tip. I will certainly look into it.
Separately, I'd be curious about where you're running into performance
issues within the context of your system. Is the cost you're concerned
with building up the ordinal map? That's certainly expensive, but it's
a one-time cost (until you refresh your index).
That's not the problem. The index changes rarely during operation, anyways.
Or are you concerned
with the actual map lookup within your tight loop?
Yes. The index I tested has about 3M documents and overall about 180M
doc value ords. Just iterating, without retrieving the actual values or
building the result map takes close to 3s. All the time seems to be
spent in SortedSetDocValues.nextOrd and LongValues.get.
If the latter, you
could consider doing more work at the slice-level by separately
determining the child ords for each dim ord within the context of each
segment (there's no off-the-shelf code for this that I'm aware of, so
you'd have to roll your own).
I was thinking about this as well. So lets say, I have the ord ranges
for dimensions per segment. I guess, I could easily test this by making
sure, there is only one segment, so I can use the global ord ranges I
already got. What would be the best way to jump to those ords for each
document? If I use SortedSetDocValues, I'd still had to iterator through
all ords per document, right? Maybe that's not really the problem. I've
tried to time this, but both the profiler and hand crafted timing code
massively skew the results, so I'm not sure, I trust those measurements.
Cheers
harry
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org