You could flatten the intervals into different documents. This would
make retrieval of all of document's sectors a bit more clumsy but
searching would be simpler and the number of fields would be constant.
So each document would look like this:

document_id: xyz
sector_num: ...
start: ...
end: ...

and then the search query would narrow down to a concrete
document/sector and interval in each boolean clause. The document ID
will be repeated over multiple lucene documents but this isn't a
problem.

There is a risk of exceeding the max number of clauses in either of
these solutions if the number of sectors goes too high. Maybe a plain
old sql database would
be a better fit here?

Dawid

On Mon, Jan 20, 2025 at 2:34 PM Cleber Muramoto
<cleber.muram...@gmail.com> wrote:
>
> Hello.
>
> My model has the following Root structure, which consists of N
> "TimeSpaceIntervals":
>
> {
>   id: <int>,
>   intervals: [
>   {
>     sector: <string>,
>      entry: <int>,
>      exit: <int>
>    }, ....
>   ]
> }
>
> (exit>=entry is guaranteed)
>
> Given a new Root record, I must check for intersections with the incoming
> data, which means, finding any document in the index having a sector whose
> time interval [entry, exit] overlaps with a corresponding sector of the
> incoming data.
>
> Currently, given a Root r, I am converting the records to documents as
> follows:
>
>     public Document doc(Root r) {
>         var doc = new Document();
>         doc.add(new IntPoint("id", r.id));
>
>         r.intervals.forEach(i -> {
>             doc.add(new IntPoint(i.sector + ".entry", i.entry));
>             doc.add(new IntPoint(i.sector + ".exit", i.exit));
>         });
>         return doc;
>     }
>
> And for a given Root n, the intersection query becomes:
>
>     Query intersection(Root n) {
>         var q = new BooleanQuery.Builder();
>         // exclude same id
>         q.add(new BooleanClause(IntPoint.newExactQuery("id", n.id),
> Occur.MUST_NOT));
>         // find overlapping sectors
>         n.intervals().forEach(i -> {
>             var sub = new BooleanQuery.Builder();
>             // other docs must start before this exits
>             sub.add(new BooleanClause(IntPoint.newRangeQuery(i.sector +
> ".entry", 0, i.exit), Occur.FILTER));
>             // other docs must end after this starts
>             sub.add(new BooleanClause(IntPoint.newRangeQuery(i.sector +
> ".exit", i.entry, Integer.MAX_VALUE), Occur.FILTER));
>
>             q.add(new BooleanClause(sub.build(), Occur.SHOULD));
>         });
>
>         return q.build();
>     }
>
> The problem with this approach is that the index will have as many fields
> as twice the cardinality of sectors. Currently, the number of distinct
> sectors small (< 500), so I think this strategy is OK, but I don't like the
> idea of having "dynamic fields".
>
> Given the intersection query requirement, is there a better way to model
> the index, aside from creating multiple documents per Root entry?
>
> Regards

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to