Re: [GSoC 2026] NL2SQL++ Proposal Introduction — Muhammad Abdullah

Muhammad Abdullah Iftikhar Tue, 24 Mar 2026 01:34:42 -0700

Hi Suryaa, Mike, and the AsterixDB community,

Thank you for the detailed discussion in the Tanya Rai thread. I read
through your responses carefully and want to share how your feedback is
shaping my proposal.

Three things stood out to me:

1. Privacy-first design: You mentioned that data should never leave to
external APIs like Claude or OpenAI. My proposal already includes Ollama
support for local model inference, but I want to make this the default
rather than an optional add-on. The LLM bridge in my design will treat
local models as the primary path, with external APIs as an opt-in for users
who explicitly accept that tradeoff.

2. SQLPP.jj as the knowledge base: I was not aware of this file before your
response to Tanya. I have since located it in
asterixdb/asterix-lang-sqlpp/src/main/javacc/SQLPP.jj and read through the
grammar. This changes my approach to few-shot prompting significantly.
Rather than hand-writing NL to SQL++ examples, I can parse SQLPP.jj to
auto-generate grammar-grounded examples that cover constructs like GROUP
AS, UNNEST, and positional access that generic SQL training data will miss
entirely.

3. SQL++ to NL (reverse direction): Mike, your suggestion about a reverse
explanation capability is exactly the kind of verification layer that makes
NL-to-SQL safe for non-expert users. I am considering scoping this as a
second deliverable in my proposal, where the system generates a query and
then explains it back in plain English before executing it, so the user can
confirm intent.

I also tried reaching Suryaa at [email protected] directly but the
message bounced. Glad to continue the conversation here.

One question: the existing SQL++ test cases in the repo, are these in
asterixdb/asterix-sqlpp/src/test/resources/runtimets? I want to use them as
the ground truth for validator integration tests.

Thank you,
Muhammad Abdullah
GitHub: github.com/abdullahiftikharcode
Email: [email protected]

On Tue, Mar 24, 2026 at 9:58 AM Muhammad Abdullah Iftikhar <
[email protected]> wrote:

> Hi AsterixDB community,
>
> My name is Muhammad Abdullah, a Computer Science student at Information
> Technology University (ITU), Lahore, Pakistan (CGPA: 3.91/4.00). I have
> submitted a GSoC 2026 proposal for the NL2SQL++ Intelligent Query Assistant
> project.
>
> A bit about my relevant background:
> - Built Make-It-RAG: a natural language to SQL system using Google Gemini
> with a React chat interface
> - Built Klebbix: a production RAG system with hybrid search over Qdrant +
> Azure OpenAI, processing 8+ file formats at 60+ req/min
> - Strong Python, FastAPI, and LLM integration experience
>
> To prepare, I ran AsterixDB locally via Docker, explored the TinySocial
> sample dataset in SQL++, and reviewed the /query/service REST API. I also
> just submitted PR #41 (https://github.com/apache/asterixdb/pull/41)
> fixing a couple of typos in the README — a small first step to get familiar
> with the contribution workflow.
>
> My proposal covers:
> 1. A schema-aware RAG context engine that ingests AsterixDB metadata and
> retrieves relevant schema fragments per query
> 2. A model-agnostic LLM bridge (OpenAI, Gemini, Ollama) with
> SQL++-specific few-shot prompting
> 3. A query validator using AsterixDB's /query/service explain mode with a
> self-correction loop
> 4. A REST service + minimal web UI, with full integration tests against
> TinySocial and Yelp sample datasets
>
> One specific question for the community: is there an existing SQL++ parser
> test suite in the repo that would be the right target for validation test
> cases in this project?
>
> Thank you,
> Muhammad Abdullah
> GitHub: github.com/abdullahiftikharcode
> Email: [email protected]
>

Re: [GSoC 2026] NL2SQL++ Proposal Introduction — Muhammad Abdullah

Reply via email to