eldenmoon opened a new pull request, #63970:
URL: https://github.com/apache/doris/pull/63970

   ## Proposed changes
   
   This patch reduces parse-time memory for sparse plain dynamic Variant 
columns.
   
   - Parse plain dynamic non-doc Variant object JSON into doc-value KV during 
storage parse instead of eagerly expanding every path into subcolumns.
   - Keep the old eager subcolumn parse path for cases that still depend on 
parse-time path/type metadata: nested group, deprecated flatten nested, 
predefined typed paths, and parent inverted index columns.
   - Add a writer-side doc-value plan in `VariantColumnWriterImpl` to choose 
materialized paths, write them through the materialized subcolumn flow, and 
write the remaining paths to sparse columns.
   - Move sparse handling for this path into `VariantColumnWriterImpl` and add 
focused BE UT coverage.
   
   The sparse parse memory UT simulates the CIR-20431 shape and shows:
   
   ```text
   old_subcolumns=1001
   new_subcolumns=1
   old_bytes=6224384
   new_bytes=45056
   ```
   
   This is `ColumnVariant::allocated_bytes()` in the unit test, not process RSS.
   
   ## Testing
   
   Current head `f971585d87bd73b0e4d447f2760004fc3a5f2051` on latest 
`upstream/master`:
   
   - `git diff --check upstream/master...HEAD`
   - `env DORIS_CLANG_HOME=/mnt/disk1/claude-max/ldb_toolchain20 
PATH=/mnt/disk1/claude-max/ldb_toolchain20/bin:$PATH ./run-be-ut.sh --run 
--filter='VariantUtilTest.ParseVariantColumns_StorageNonDocScalarJsonToDocValueKv:VariantUtilTest.SparseStorageParseUsesDocValueKvInsteadOfManySubcolumns:VariantUtilTest.ParseVariantColumns_StorageNonDocDocValueKvSkipsInvalidRoot:VariantColumnWriterReaderTest.test_storage_parse_kv_write_materialized_and_sparse'`
   
   Also verified before rebasing to the latest master:
   
   - Release BE build passed.
   - `./run-be-ut.sh --run --filter='*Variant*'`: 193 passed, 1 skipped.
   - Targeted `variant_p0` suites passed, including `desc`, 
`test_types_in_variant`, delete/update, predefine typed-to-sparse, schema 
change, and external meta edge cases.
   - Full `variant_p0` attempted: 139 suites, 7 failed. The failures were 
unrelated environment/framework issues: one OSS `InvalidAccessKeyId` for 
outfile, and six `/api/debug_point/remove/...` HTTP 500 failures while cleaning 
debug points. No product assertion mismatch was found in the modified Variant 
writer/parse paths.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to