eldenmoon opened a new pull request, #63192:
URL: https://github.com/apache/doris/pull/63192
### What problem does this PR solve?
Issue Number: N/A
Related PR: N/A
Problem Summary: Doris could not read Iceberg v3 VARIANT columns from
Parquet files. This change maps Iceberg VARIANT to Doris VARIANT, decodes
unshredded VARIANT metadata/value encoding, reads shredded typed_value columns,
and prunes shredded Parquet leaf columns for accessed variant paths with a
profile observable.
### Release note
Support reading Iceberg v3 VARIANT Parquet columns, including shredded
typed_value column pruning.
### Check List (For Author)
- Test: Regression test / Unit Test / Manual test
- Regression test: ./run-regression-test.sh --run --conf
tmp/regression-conf.auto.groovy -d external_table_p0/tvf -s
test_local_tvf_iceberg_variant
- Unit Test: JAVA_TOOL_OPTIONS=-Djdk.attach.allowAttachSelf=true
./run-fe-ut.sh --coverage --run
org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest
- Manual test: ./build.sh --be --fe; build-support/clang-format.sh;
build-support/check-format.sh; git diff --check
- Behavior changed: Yes. Doris can read Iceberg v3 VARIANT Parquet columns
and avoid reading unneeded shredded typed_value/value leaves for selected
variant subpaths.
- Does this need documentation: No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]