Hi Suryaa, Mike, and the AsterixDB community, Thank you for the detailed discussion in the Tanya Rai thread. I read through your responses carefully and want to share how your feedback is shaping my proposal.
Three things stood out to me: 1. Privacy-first design: You mentioned that data should never leave to external APIs like Claude or OpenAI. My proposal already includes Ollama support for local model inference, but I want to make this the default rather than an optional add-on. The LLM bridge in my design will treat local models as the primary path, with external APIs as an opt-in for users who explicitly accept that tradeoff. 2. SQLPP.jj as the knowledge base: I was not aware of this file before your response to Tanya. I have since located it in asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj and read through the grammar. This changes my approach to few-shot prompting significantly. Rather than hand-writing NL to SQL++ examples, I can parse SQLPP.jj to auto-generate grammar-grounded examples that cover constructs like GROUP AS, UNNEST, and positional access that generic SQL training data will miss entirely. 3. SQL++ to NL (reverse direction): Mike, your suggestion about a reverse explanation capability is exactly the kind of verification layer that makes NL-to-SQL safe for non-expert users. I am considering scoping this as a second deliverable in my proposal, where the system generates a query and then explains it back in plain English before executing it, so the user can confirm intent. I also tried reaching Suryaa at [email protected] directly but the message bounced. Glad to continue the conversation here. One question: the existing SQL++ test cases in the repo, are these in asterixdb/asterix-sqlpp/src/test/resources/runtimets? I want to use them as the ground truth for validator integration tests. Thank you, Muhammad Abdullah GitHub: github.com/abdullahiftikharcode Email: [email protected] On Tue, Mar 24, 2026 at 9:58 AM Muhammad Abdullah Iftikhar < [email protected]> wrote: > Hi AsterixDB community, > > My name is Muhammad Abdullah, a Computer Science student at Information > Technology University (ITU), Lahore, Pakistan (CGPA: 3.91/4.00). I have > submitted a GSoC 2026 proposal for the NL2SQL++ Intelligent Query Assistant > project. > > A bit about my relevant background: > - Built Make-It-RAG: a natural language to SQL system using Google Gemini > with a React chat interface > - Built Klebbix: a production RAG system with hybrid search over Qdrant + > Azure OpenAI, processing 8+ file formats at 60+ req/min > - Strong Python, FastAPI, and LLM integration experience > > To prepare, I ran AsterixDB locally via Docker, explored the TinySocial > sample dataset in SQL++, and reviewed the /query/service REST API. I also > just submitted PR #41 (https://github.com/apache/asterixdb/pull/41) > fixing a couple of typos in the README — a small first step to get familiar > with the contribution workflow. > > My proposal covers: > 1. A schema-aware RAG context engine that ingests AsterixDB metadata and > retrieves relevant schema fragments per query > 2. A model-agnostic LLM bridge (OpenAI, Gemini, Ollama) with > SQL++-specific few-shot prompting > 3. A query validator using AsterixDB's /query/service explain mode with a > self-correction loop > 4. A REST service + minimal web UI, with full integration tests against > TinySocial and Yelp sample datasets > > One specific question for the community: is there an existing SQL++ parser > test suite in the repo that would be the right target for validation test > cases in this project? > > Thank you, > Muhammad Abdullah > GitHub: github.com/abdullahiftikharcode > Email: [email protected] >
