Hi TVM folks,
I wanted to share a small language-runtime experiment and ask whether systems
like this fit naturally into the compiler/runtime co-design conversation.
We built a public demo line called Engram and deployed it on a commodity
ESP32-C3.
Current public numbers:
* Host-side benchmark capability
* `LogiQA = 0.392523`
* `IFEval = 0.780037`
* Published board proof
* `LogiQA 642 = 249 / 642 = 0.3878504672897196`
* `host_full_match = 642 / 642`
* runtime artifact size = `1,380,771 bytes`
Important scope note:
This is **not** presented as unrestricted open-input native LLM generation on
MCU.
The board-side path is closer to a flash-resident, table-driven runtime with:
* packed token weights
* hashed lookup structures
* fixed compiled probe batches
* streaming fold / checksum style execution over precompiled structures
So this is not a conventional dense graph compiler story. It is closer to a
task-specialized language runtime whose behavior has been pushed into a compact
executable form under severe memory constraints.
Repo:
https://github.com/Alpha-Guardian/Engram
What I’m curious about is whether people here would think of systems like
this as:
* outside the normal ML compiler scope
* an adjacent compiler/runtime co-design problem
* or evidence that some language-task systems may want a very different
compiled execution form than standard graph runtimes
Would be very interested in any thoughts.--- [Visit Topic](https://discuss.tvm.apache.org/t/how-would-you-classify-a-flash-resident-tiny-language-runtime-from-a-compiler-runtime-co-design-perspective/18961/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/3945c36f637679a5f8c81f2478ba5d714dcd6f423c78cbbfff75add9bd7a1806).
