Hi Jingsong, +1 on the opt-in design. Defaulting to no stats makes sense for wide-table workloads — computing min/max across thousands of columns introduces non-trivial metadata overhead, and letting users explicitly opt in for frequently filtered columns (e.g. hot feature columns) strikes a good balance.
Best, Dapeng Jingsong Li <[email protected]> 于2026年5月13日周三 21:31写道: > Hi Dapeng, > > We may be able to support filter pushdown, such as storing min max, > specifying the columns that need to build stats, and not building them > by default without occupying storage. > > Best, > Jingsong > > On Wed, May 13, 2026 at 7:41 PM Jingsong Li <[email protected]> > wrote: > > > > Thanks Dapeng for your feedback. > > > > - Schema evolution: Actually, this ability should be handled by the > > Paimon layer, which will evolve the schema based on the difference > > between the file's Schema ID and the currently read schema. However, > > the format itself should also have some ability to read based on > > column names, and columns without them will return NULL, and handle > > simple type changes, just like Parquet is used in Paimon. > > > > - Filter pushdown: The first version did not plan to carry out Filter > > PushDown, and perhaps we need to support specifying statistical > > information for certain columns in the future, but this is far away. > > > > - Repository: We will first incubate it in the Paimon community until > > the ecosystem is more robust, such as using it for other table > > formats, and then consider a separate repository. > > > > Best, > > Jingsong > > > > On Wed, May 13, 2026 at 7:09 PM Dapeng Sun <[email protected]> wrote: > > > > > > Hi Jingsong, > > > > > > Thanks for sharing this — the design looks really promising for wide > table > > > scenarios. > > > > > > The projection latency numbers stand out in particular. 2.3ms for 1 > column > > > out of 10,000 is a meaningful result, and the name-based bucketing > aligns > > > well with real-world patterns where columns tend to share common > prefixes > > > (e.g., feature stores or multi-modal metadata like `image_*`). > > > > > > A few questions as this evolves: > > > > > > - Schema evolution: How does Mosaic handle column additions or renames? > > > Since bucket assignment is range-based on column names, a rename could > > > shift a column across bucket boundaries — curious if there's a planned > > > strategy for that. > > > - Filter pushdown: Is predicate pushdown on the roadmap, or is the > current > > > focus primarily on projection? For feature serving workloads, point > lookups > > > with filters could be another interesting optimization target. > > > - Repository: A standalone repo might make it easier for other > projects to > > > adopt it independently, without taking on Paimon as a dependency — > though > > > I'm curious how you're thinking about this. > > > > > > Looking forward to the RFC and seeing this develop further! > > > > > > Best, > > > Dapeng > > > > > > Jingsong Li <[email protected]> 于2026年5月13日周三 18:00写道: > > > > > > > Hi everyone, > > > > > > > > I'd like to introduce a new file format for the wide table. > > > > > > > > Mosaic is a columnar-bucket hybrid format optimized for wide tables > > > > (10,000+ columns). Columns are sorted by name and evenly distributed > > > > into buckets using range-based assignment, stored column-oriented > > > > within each bucket, and independently compressed. This enables > > > > efficient projection pushdown at bucket granularity — reading 10 > > > > columns out of 10,000 only decompresses the buckets that contain > those > > > > 10 columns. Range-based assignment ensures that columns with similar > > > > name prefixes land in the same bucket, improving both compression > > > > ratio and projection locality. > > > > > > > > - Columns are grouped into buckets by name, enabling selective I/O > > > > — read only the buckets you need. > > > > - Each column is automatically encoded as ALL_NULL, CONST, DICT, or > > > > PLAIN based on its data distribution. > > > > - Optional Zstandard compression for both data buckets and the schema > > > > block, with configurable compression level. > > > > - Byte Pair Encoding compresses column names in the schema block, > > > > reducing metadata overhead for wide tables. > > > > - 18 data types from Boolean to TimestampLtz, with support for > > > > fixed-width and variable-length encodings. > > > > > > > > +--------------------------------------------+ > > > > | Row Group 0: Bucket Data | > > > > | [Bucket 0 compressed block] | > > > > | [Bucket 3 compressed block] | > > > > | ... (only non-empty buckets) | > > > > +--------------------------------------------+ > > > > | Row Group 1: Bucket Data | > > > > | ... | > > > > +--------------------------------------------+ > > > > | Schema Block | > > > > | [4 bytes: uncompressed size (BE int)] | > > > > | [schema data (possibly compressed)] | > > > > +--------------------------------------------+ > > > > | Row Group Index (varint encoded) | > > > > +--------------------------------------------+ > > > > | Footer (32 bytes, fixed) | > > > > +--------------------------------------------+ > > > > > > > > Benchmark compared to Parquet and ORC: > > > > > > > > Test setup: 10,000 columns (90% STRING, 10% INT), column names ~80 > > > > bytes each, Zstd compression (level 9). > > > > > > > > **File Size (10 rows):** > > > > > > > > | Format | Size | vs Mosaic | > > > > |---------|------------|-----------| > > > > | Parquet | 9,696 KB | 14.8x | > > > > | ORC | 6,377 KB | 9.7x | > > > > | Mosaic | 654 KB | 1x | > > > > > > > > **Projection Read (500 rows):** > > > > > > > > | Projected Columns | Parquet | ORC | Mosaic | > > > > |-------------------|------------|------------|-----------| > > > > | 10 / 10,000 | 53,170 us | 72,729 us | 25,081 us | > > > > | 1 / 10,000 | 50,919 us | 70,712 us | 2,374 us | > > > > > > > > File size — Parquet: 57.4 MB, ORC: 95.4 MB, Mosaic: 11.5 MB > > > > > > > > **Projection Read (4,500 rows, ~458 MB Parquet):** > > > > > > > > | Projected Columns | Parquet | ORC | Mosaic | > > > > |-------------------|-------------|------------|------------| > > > > | 10 / 10,000 | 369,627 us | 89,344 us | 67,314 us | > > > > | 1 / 10,000 | 360,458 us | 81,934 us | 26,924 us | > > > > > > > > File size — Parquet: 458.4 MB, ORC: 827.9 MB, Mosaic: 100.2 MB > > > > > > > > When projecting a small subset of columns, Mosaic only decompresses > > > > the buckets containing the requested columns, avoiding I/O on the > > > > remaining data. > > > > > > > > POC is in https://github.com/JingsongLi/paimon/tree/fast_format > > > > > > > > We may need to create a separate repo for it. > > > > > > > > What do you think? > > > > > > > > Best, > > > > Jingsong > > > > >
