pineappleBest123 opened a new pull request, #47: URL: https://github.com/apache/asterixdb/pull/47
## Summary
Add the schema extraction pipeline for the GSoC 2026 NL2SQL++ project.
This patch builds on top of the servlet infrastructure introduced in
https://github.com/apache/asterixdb/pull/46.
### Changes
- `ColumnInfo`: field name, type string, and primary-key flag with
prompt-ready `toDescriptionString()` output
- `DatasetSchema`: holds all columns, supports pruned column subset
(for ColumnPruner in a later PR) and value hints (for ValueHintsSampler)
- `DatasetSchemaFormatter`: recursively converts ADM `IAType` objects to
human-readable strings (supports nested records, arrays, multisets,
nullable unions, depth limit of 4)
- `SchemaContextBuilder`: reads Dataset and type metadata from
`MetadataManager`, builds a `SchemaContext` with one description
string per Dataset, wrapped in a metadata transaction
- 13 unit tests covering all formatter rules and schema pipeline behavior
### Example output
Dataset TweetMessages (tweetid: int64 [PK], sender-location: any,
send-time: datetime, referred-topics: [string], message-text: string,
author-id: int64)
### Testing
All unit tests pass: `mvn test -pl asterixdb/asterix-spidersilk`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
