FANNG1 opened a new issue, #11588:
URL: https://github.com/apache/gravitino/issues/11588
### Version
main branch
### Describe what's wrong
The Lance REST server's `ListTables` endpoint returns **fully-qualified**
table names (`catalog{delimiter}schema{delimiter}table`) instead of the **leaf
table names** required by the Lance Namespace spec.
As a result, a Spark client using the Lance namespace connector shows
polluted table names in `SHOW TABLES` — each table name embeds the catalog and
schema, e.g. `catalog.schema.my_table` instead of `my_table`.
### Root cause
`lance/lance-common/src/main/java/org/apache/gravitino/lance/common/ops/gravitino/GravitinoLanceNameSpaceOperations.java`
(`listTables`):
```java
List<String> tables =
Arrays.stream(catalog.asTableCatalog().listTables(Namespace.of(schemaName)))
.map(ident -> Joiner.on(delimiter).join(catalogName, schemaName,
ident.name())) // <-- returns full qualified name
.sorted()
.collect(Collectors.toList());
```
The parent namespace is already conveyed by the request `id` (`catalog`,
`schema`), so the response must only contain the child table names.
### Why it surfaces in Spark
The Lance Spark connector (`BaseLanceNamespaceSparkCatalog.listTables`)
trusts the response strings as leaf names and wraps them directly:
```java
for (String table : response.getTables()) {
identifiers.add(Identifier.of(namespace, table)); // table =
"catalog.schema.tbl"
}
```
Spark then renders `Identifier.name()`, i.e. the full string returned by the
server, producing the `catalog.schema.table` display.
### Spec & reference implementations
- Spec `ListTables` (`docs/src/spec.yaml`): *"List all child table **names**
of the parent namespace `id`."*
- Reference impls return only the leaf name, e.g. Glue: `.forEach(t ->
tables.add(t.name()))`.
(Note: the shared `ListTablesResponse` schema description mentioning a "full
identifier in string form" refers to the recursive `/v1/table`
(list-all-tables) endpoint, not the per-namespace `ListTables`.)
### Secondary issues in the same method
1. Results are sorted with `.sorted()` and then collected into a `HashSet`
via `Sets.newHashSet(page.items())`, which discards the ordering. A
`ListNamespacesResponse` object is also reused to carry table results, which is
misleading.
2. The method hard-asserts a 2-level namespace (`nsId.levels() == 2`); worth
verifying this matches the connector's `parent` / `single_level_ns`
configuration expectations.
### How to reproduce
1. Start the Gravitino Lance REST server over a lakehouse catalog with a
schema containing one or more tables.
2. Configure a Spark `lance` namespace catalog (`impl=rest`) pointing at the
server.
3. Run `SHOW TABLES`.
4. Observe table names appear as `catalog.schema.table` instead of `table`.
### Expected behavior
`ListTables` should return only the leaf table names, so `SHOW TABLES` shows
`table`.
### Additional context
Suggested fix: change the mapping to `.map(ident -> ident.name())` (and
preserve sort order / use a proper list for the paged result).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]