polyzos commented on PR #2264:
URL: https://github.com/apache/fluss/pull/2264#issuecomment-3708218285
@Priyamanjare54 this is great work 👌 I think before merging we can just a
few things as “summaries”, like in the beginning add a quick section in terms
of “of how to think about encodings”:
## How to Think About Encodings in Fluss
In Fluss, a data encoding primarily determines:
- **How data is laid out on disk (columnar vs row-oriented)**
- **How efficiently data can be filtered, projected, and scanned**
- **Whether the encoding is optimized for streaming scans or key-based
access**
Encodings in Fluss determine:
- CPU vs IO tradeoffs
- Scan-heavy vs lookup-heavy workloads
- Analytical vs operational access patterns
And then we can add a table with the exact tradeoffs maybe in the bottom of
the page.
## ARROW vs COMPACTED
| Dimension | ARROW |
COMPACTED |
|---------------------------|------------------------------------|-----------------------------------|
| Physical layout | Columnar |
Row-oriented |
| Typical access pattern | Scans with projection & filters | Full-row
reads or key lookups |
| Column pruning | ✅ Yes | ❌ No
|
| Predicate pushdown | ✅ Yes | ❌ No
|
| Storage efficiency | Good | Excellent
|
| CPU efficiency | Better for selective reads | Better for
full-row reads |
| Log encoding | âś… Yes | âś… Yes
|
| KV encoding | ❌ No | ✅ Yes
|
| Best suited for | Analytics, streaming analytics | State
tables, materialized views |
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]