Morning everyone,
in our company we are running a POC using Apache Calcite, and we would like to
collect some feedbacks from you for the scenario mentioned below.
There is a service API that allows retrieving some `Book`s, and we would like
to build a table adapter on top of this service; this API
only accepts a set of IDs, and if no IDs are provided, no result is returned.
```
interface BookService {
/** Returns the books matching the given IDs.
*
* <p>If not IDs is provided, no result is returned.
*/
List<Book> findBooksByIds(Set<String> ids);
}
record Book(String id, String title) {};
```
A requirement of this table is that it has to support join, and we would like
to support joining by ID in an efficient way.
The goal is to define a rule that forces the query planner to always push down
join predicates into a table scan.
Given the following `book` table:
```java
/** A table which represents books, queryable only by their ID. */
final class BookTable extends AbstractTable implements FilterableTable {
private final BookService service;
BookTable(BookService service) {
this.service = service;
}
@Override
public RelDataType getRowType(RelDataTypeFactory typeFactory) {
return new RelDataTypeFactory.Builder(typeFactory)
.add("id", SqlTypeName.VARCHAR)
.add("title", SqlTypeName.VARCHAR)
.build();
}
@Override
public Enumerable<Object[]> scan(DataContext root, List<RexNode> filters) {
Set<String> bookIds = getBooksId(filters);
List<Object[]> result = service.findBooksByIds(bookIds)
.stream()
.map(b -> new Object[]{b.id, b.title})
.toList();
return Linq4j.asEnumerable(result);
}
private static Set<String> getBooksId(List<RexNode> filters) {
if (filters.size() != 1) {
throw new IllegalArgumentException("Expected one filter to the ID, found:
%d".formatted(filters.size()));
}
RexNode filter = filters.get(0);
RexNode leftCondition = ((RexCall) filter).getOperands().get(0);
RexNode rightCondition = ((RexCall) filter).getOperands().get(1);
if (leftCondition instanceof RexInputRef left
&& rightCondition instanceof RexLiteral right
// The index of the ID column is 1.
&& left.getIndex() == 1) {
if (filter.isA(SqlKind.EQUALS)) {
String bookId = right.getValue2().toString();
return ImmutableSet.of(bookId);
}
if (filter.isA(SqlKind.SEARCH)) {
@SuppressWarnings("unchecked")
Sarg<NlsString> searchArguments = right.getValueAs(Sarg.class);
return searchArguments.rangeSet.asRanges().stream()
.map(Range::lowerEndpoint)
.map(NlsString::getValue)
.collect(toSet());
}
}
throw new IllegalArgumentException("Unexpected operator, found:
%s".formatted(filter.getKind()));
}
}
```
The API of the `BookService` always expects a set of IDs, and in case of query
like (assuming an entry in `book` matching the ID `'a'`):
```sql
WITH config (id, val) AS (
VALUES ('a', 3), ('b', 5)
)
SELECT b.* FROM books b
INNER JOIN config ON b.id = config.id
WHERE config.val > 4
```
Calcite produces the following plan:
```
EnumerableCalc(expr#0..2=[{inputs}], id=[$t1], title=[$t2])
EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
EnumerableSort(sort0=[$0], dir0=[ASC])
EnumerableCalc(expr#0..1=[{inputs}], id=[$t0])
EnumerableTableScan(table=[[books]])
EnumerableSort(sort0=[$0], dir0=[ASC])
EnumerableTableScan(table=[[books]])
```
This means Calcite performs a full table scan of the `book` table, and since
the `RexNode` filters in the `scan` method are empty, no result is returned (in
this example we are using a value statement scoped views, but ideally the
solution we are looking for should be valid for other table).
Under some defined circustances, Postgres generates Nested-Loop query plan for
join: it first selects the row of the table A matching a given condition, then
iterates over the retrieved rows and performs a scan of table B looking for
rows that match the join condition; and this seems something we would like to
enforce it here.
In summary, we would like to implement a table which in case of JOIN is capable
of loading the individual IDs matching the API of our service, rather than
performing a full table scan.
Do you have any advises/feedback for us?
Thanks in advance.