You could flatten the intervals into different documents. This would make retrieval of all of document's sectors a bit more clumsy but searching would be simpler and the number of fields would be constant. So each document would look like this:
document_id: xyz sector_num: ... start: ... end: ... and then the search query would narrow down to a concrete document/sector and interval in each boolean clause. The document ID will be repeated over multiple lucene documents but this isn't a problem. There is a risk of exceeding the max number of clauses in either of these solutions if the number of sectors goes too high. Maybe a plain old sql database would be a better fit here? Dawid On Mon, Jan 20, 2025 at 2:34 PM Cleber Muramoto <cleber.muram...@gmail.com> wrote: > > Hello. > > My model has the following Root structure, which consists of N > "TimeSpaceIntervals": > > { > id: <int>, > intervals: [ > { > sector: <string>, > entry: <int>, > exit: <int> > }, .... > ] > } > > (exit>=entry is guaranteed) > > Given a new Root record, I must check for intersections with the incoming > data, which means, finding any document in the index having a sector whose > time interval [entry, exit] overlaps with a corresponding sector of the > incoming data. > > Currently, given a Root r, I am converting the records to documents as > follows: > > public Document doc(Root r) { > var doc = new Document(); > doc.add(new IntPoint("id", r.id)); > > r.intervals.forEach(i -> { > doc.add(new IntPoint(i.sector + ".entry", i.entry)); > doc.add(new IntPoint(i.sector + ".exit", i.exit)); > }); > return doc; > } > > And for a given Root n, the intersection query becomes: > > Query intersection(Root n) { > var q = new BooleanQuery.Builder(); > // exclude same id > q.add(new BooleanClause(IntPoint.newExactQuery("id", n.id), > Occur.MUST_NOT)); > // find overlapping sectors > n.intervals().forEach(i -> { > var sub = new BooleanQuery.Builder(); > // other docs must start before this exits > sub.add(new BooleanClause(IntPoint.newRangeQuery(i.sector + > ".entry", 0, i.exit), Occur.FILTER)); > // other docs must end after this starts > sub.add(new BooleanClause(IntPoint.newRangeQuery(i.sector + > ".exit", i.entry, Integer.MAX_VALUE), Occur.FILTER)); > > q.add(new BooleanClause(sub.build(), Occur.SHOULD)); > }); > > return q.build(); > } > > The problem with this approach is that the index will have as many fields > as twice the cardinality of sectors. Currently, the number of distinct > sectors small (< 500), so I think this strategy is OK, but I don't like the > idea of having "dynamic fields". > > Given the intersection query requirement, is there a better way to model > the index, aside from creating multiple documents per Root entry? > > Regards --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org