zclllyybb commented on issue #64244: URL: https://github.com/apache/doris/issues/64244#issuecomment-4649076538
Breakwater-GitHub-Analysis-Slot: slot_1f2dd3981a48 Initial read: this is a valid feature request, but it is not supported by current upstream master. I checked upstream/master at `ab930cd39b7` from 2026-06-08. There is no `row_store_only` property in FE property analysis or BE tablet schema/protobuf/thrift, so the sample DDL would currently be rejected during create-table property validation as an unknown property. What exists today: 1. `store_row_column=true` and/or `row_store_columns=...` add the hidden `__DORIS_ROW_STORE_COL__` column and persist `store_row_column` plus `row_store_column_unique_ids` in tablet schema metadata. 2. When `row_store_column_unique_ids` is empty, BE treats the table as having a full row store. When it is non-empty, only those selected column unique ids are encoded into the row-store payload. 3. This is still an additional row-store column, not a replacement for columnar storage. `SegmentWriter` and `VerticalSegmentWriter` serialize the row-store value from the input block, then still create column writers for the non-row-store schema columns. Therefore the current mechanism cannot deliver the requested "primary key/index + row-store payload only" storage reduction. 4. Point-query execution can read and decode the row-store payload, but if the payload does not cover all requested columns it falls back to column-store reads when `enable_short_circuit_query_access_column_store` is enabled, or returns an error when it is disabled. A row-store-only table cannot depend on that fallback. Suggested next step: treat this as a new storage-mode design, not as a small property alias. The first safe scope should probably be explicit, for example: UNIQUE KEY MoW only, `store_row_column=true`, full row store only (`row_store_columns` unset or all visible columns), and no secondary materialized indexes/rollups until their behavior is defined. Implementation points that need to be specified and covered: 1. FE DDL validation and metadata: add `row_store_only` as a table property, persist it through table property/tablet schema/thrift/proto, include show-create/restore/CCR compatibility, and reject incompatible combinations such as partial `row_store_columns` unless a partial row-store-only mode is intentionally designed. 2. Write path: change segment writers, vertical writers, compaction, and schema-change writers so non-key value column writers are intentionally skipped while key columns, hidden delete/sequence/version/skip-bitmap semantics, the primary-key index, and the row-store column remain correct. 3. Read path: define whether normal OLAP scans and lightweight analytics decode from `__DORIS_ROW_STORE_COL__` or are rejected/routed differently. Current column-scan code expects per-column segment data, while the row-store decode path is specific to point-query/row-fetch flows. 4. Operations/tests: cover load, compaction, schema change, partial update/delete, point-query `SELECT *` and projection, `enable_short_circuit_query_access_column_store=false`, backup/restore/CCR, and upgrade/downgrade behavior. Missing information that would make the design review easier: target release/version, whether this must support existing tables via `ALTER` or only new tables, the expected "lightweight analytics" subset, storage/latency benchmarks versus current `store_row_column=true`, and whether secondary indexes/materialized views must be supported in the first version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
