Suggestions for modeling an Index

Cleber Muramoto Mon, 20 Jan 2025 12:33:30 -0800

Hello.

My model has the following Root structure, which consists of N
"TimeSpaceIntervals":


{
  id: <int>,
  intervals: [
  {
    sector: <string>,
     entry: <int>,
     exit: <int>
   }, ....
  ]
}

(exit>=entry is guaranteed)

Given a new Root record, I must check for intersections with the incoming
data, which means, finding any document in the index having a sector whose
time interval [entry, exit] overlaps with a corresponding sector of the
incoming data.

Currently, given a Root r, I am converting the records to documents as
follows:

    public Document doc(Root r) {
        var doc = new Document();
        doc.add(new IntPoint("id", r.id));

        r.intervals.forEach(i -> {
            doc.add(new IntPoint(i.sector + ".entry", i.entry));
            doc.add(new IntPoint(i.sector + ".exit", i.exit));
        });
        return doc;
    }

And for a given Root n, the intersection query becomes:

    Query intersection(Root n) {
        var q = new BooleanQuery.Builder();
        // exclude same id
        q.add(new BooleanClause(IntPoint.newExactQuery("id", n.id),
Occur.MUST_NOT));
        // find overlapping sectors
        n.intervals().forEach(i -> {
            var sub = new BooleanQuery.Builder();
            // other docs must start before this exits
            sub.add(new BooleanClause(IntPoint.newRangeQuery(i.sector +
".entry", 0, i.exit), Occur.FILTER));
            // other docs must end after this starts
            sub.add(new BooleanClause(IntPoint.newRangeQuery(i.sector +
".exit", i.entry, Integer.MAX_VALUE), Occur.FILTER));

            q.add(new BooleanClause(sub.build(), Occur.SHOULD));
        });

        return q.build();
    }

The problem with this approach is that the index will have as many fields
as twice the cardinality of sectors. Currently, the number of distinct
sectors small (< 500), so I think this strategy is OK, but I don't like the
idea of having "dynamic fields".

Given the intersection query requirement, is there a better way to model
the index, aside from creating multiple documents per Root entry?

Regards

Suggestions for modeling an Index

Reply via email to