morningman opened a new issue, #64065: URL: https://github.com/apache/doris/issues/64065
Apache Doris 4.0.6 is a maintenance release in the 4.0 series. It focuses on stability and operational experience. All 4.0.x users are advised to upgrade, and users running the compute-storage decoupled (cloud) mode benefit the most. Highlights of this release include: - **Compute-storage decoupling enhancements**: support read-write separation for Compaction, optimize Tablet scheduling and FE-side memory usage, and reduce RPC pressure on the Meta Service. - **Operational observability**: add Compaction task tracking (System Table and HTTP API), a dynamic `enable_recycler` switch, online adjustment of the memtable flush thread pool, and several brpc / bvar metrics. - **Data lineage**: introduce a Lineage SPI collection framework. - **Many correctness and stability fixes**: cover incorrect query results, load data loss (such as Broker Load loading only the first file, and PostgreSQL DML being silently dropped), crashes, hangs, and resource leaks. > Before upgrading, read the **Behavior Changes** section below, which contains several adjustments to default values and privileges. # Behavior Changes - **In compute-storage decoupled (cloud) mode, `enable_strict_consistency_dml` is now disabled by default** (#61891). If your workload relies on strict-consistency DML, explicitly enable this configuration after upgrading. - **The default trigger threshold for Time Series Compaction is changed from 2000 to 1000** (#61979), so Compaction triggers more aggressively. - **The default value of `segments_key_bounds_truncation_threshold` is changed to 36** (#61984). - **Cluster Snapshot commands now require root privileges** (#60239). - **JSONB and Variant columns are no longer allowed as Distribution Columns** (#63211). Such previously-incorrect usage is now rejected at table-creation time. # New Features & Improvements ## Functions & Types - Add the `mmhash3_u64_v2` hash function (#61846). - Add the `json_object_flatten` scalar function for flattening nested JSON (#62825). - Add a Lambda expression (functor) version of `array_sort` that supports custom sorting logic (#57828). - The `max` / `min` aggregate functions now support the Array type, and `max_by` / `min_by` support some complex types (#58490, #58736). ## Query Optimization - Limit the cost of not-null inference to reduce optimizer planning overhead for complex queries (#63318). - Scale the `num_nulls` column statistic proportionally after partition pruning to improve row-count estimation accuracy (#62265). ## Storage & Compaction - Add **Compaction task tracking**: query current Compaction tasks through a System Table and an HTTP API (#61696). - Support adjusting the memtable flush thread pool size online, without restarting BE (#60423). - Support adaptive Write Buffer sizing (#61810). - Release unused memory earlier to reduce memory usage during load and write (#62185). ## Compute-Storage Decoupling (Cloud Mode) - **Support read-write separation for Compaction** (#60310). - Schedule recently-active Tablets first to improve cache hit rate and scheduling efficiency (#59539, #57200, #61562). - Reduce `get_tablet_stats` RPC calls to the Meta Service to lower metadata-service pressure (#60543). - Optimize the memory usage of CloudTabletStatMgr / CloudTabletRebalancer (#59776, #61318). - Add the `enable_recycler` configuration to dynamically skip the Recycler (#63286). - Support hot-reloading of File Cache microbench configuration (#58922). ## Lakehouse - Iceberg supports IAM Role authentication for REST and S3 Tables (#60498). - Paimon supports Jindo OSS and Token propagation (#62106). - Support disabling View operations for the Iceberg REST Catalog (#63319). ## Data Lineage - Introduce a Lineage SPI framework to support data lineage collection (#61004). ## Observability & Operations - Add the `--drop_backends` parameter to `start_fe.sh` (#63306). - Add a transaction write-amplification brpc metric for Sub Txn Load (#63545). - Add bvar metrics for the Recycler Operation Log (#60520). - Statistics collection now automatically skips overly-long string columns to reduce collection overhead (#62686). - Remove the class histogram trace from the JDK17 startup parameters to avoid printing large amounts of logs during Full GC (#62422). ## Security & Authentication - Hide Token and authentication information in BE Info during Stream Load (#60656, #59743). - Improve the robustness and diagnostics of LDAP authentication (#61673). - Upgrade dependencies with known security vulnerabilities (#62274). # Important Bug Fixes ## Query Result Correctness - Fix `INTERSECT` / `EXCEPT` losing NULL rows during predicate pushdown (#62299). - Fix `count(null)` being incorrectly treated as `count(*)` (#62548). - Fix a NoSuchElementException when `count` contains a `MATCH_ALL` expression (#62172). - Fix incorrect results caused by losing a DateTimeV2 narrowing Cast during IN predicate simplification (#63343). - Fix loss of the `IN_LIST` Runtime Filter predicate in Key Range scenarios (#62115). - Fix materialized view rewriting merging partitions that the query does not need, which could cause incorrect results or performance issues (#63081). - Fix the `no_use_cbo_rule` Hint being silently ignored (#62358). - Fix FragmentMgr incorrectly canceling queries on a compute-storage decoupled Virtual Cluster (#62135). ## Views - Fix `ALTER VIEW` definitions not being synchronized to Follower FEs when a COMMENT is specified (#61670). - Fix incorrect query results caused by a View definition losing Variant subfields (#62907). - Fix View columns losing colUniqueId under Lazy Materialization (#62533). - Fix illegal Alias rewriting in View definitions (#63353). ## Functions & Types - Fix several incorrect results related to TIMESTAMPTZ and Daylight Saving Time (DST): LEAD/LAG not preserving the type, spring-forward gap handling, DST fold branch selection, switching time-difference calculation to UTC semantics, and TopN Runtime Predicate support (#62779, #62945, #63034, #63161, #63220). - Fix incorrect results for the `allow_zero_date` function (#61900). - Fix incorrect results when casting a scientific-notation string to Decimal (#63119). - Fix `from_olap_string` throwing an exception when it fails to parse a datetime; it now returns NULL instead (#63035). - Fix incorrect integer inference and type resolution for user variables (#62524). ## Variant Type - Fix Compaction failure when a Variant column has uid=0 in a keyless table (#62656). - Fix Variant subcolumns being lost after a partial-column update in the Row Store (#62067). - Fix a normalization issue when reading older-version single-segment dot-key subcolumns (#62409). - Keep the first occurrence when a Variant JSON Path is duplicated (#63697). - Optimize VariantStatsCaculator construction by skipping the full footer scan to improve performance (#62819). ## Load & Streaming **Data Loss / Integrity** - Fix Broker Load loading only the first file when multiple file paths are specified (#62969). - Fix PostgreSQL DML being silently dropped after a job restart (#61481). - Fix loss of S3 Offset and job statistics after an FE Checkpoint restart (#62449). - Fix incorrect Java type restoration of Split boundaries when reading FE-persisted CDC Offsets (#63219). **Hangs / Leaks** - Fix a VNodeChannel `close_wait` hang (#58024). - Fix a PostgreSQL Replication Slot leak caused by canceling a streaming job during pause/resume (#62010). - Fix a memory leak caused by an InsertLoadJob being stuck in PENDING (#62890). **Restart & Metadata** - Fix streaming job properties not being parsed after an FE restart (#62298). - Fix a StreamingInsertJob NPE during EditLog Replay (#62416). - Fix Broker Load storage properties not being correctly rebuilt after Gson Replay (#63094). - Fix loss of INSERT job statistics in `SHOW LOAD` after an FE restart (#62331). **Routine Load / Parsing** - Fix an NPE in Routine Load Kafka Meta requests (#63180). - Fix an IllegalMonitorStateException in Routine Load when the coordinating BE restarts (#62892). - Fix incorrect column parsing when a UTF-8 BOM CSV uses enclose (#62092). - Fix an NPE and load-availability issues when Group Commit selects a Backend (#60652, #61555). - Fix conflict validation between load properties and table properties for Broker Load / Routine Load (#58054). - Fix `to_load_error_http_path` returning an incorrect URL in HTTPS mode (#61785). ## Storage & Compaction - Fix BE crashes caused by Schema Cache concurrency issues related to OlapScanner (#61510, #62327). - Fix an IOContext use-after-free crash (#59947). - Fix reading older DecimalV2 Segments that are missing precision / frac (#63569). - Fix blocking during async close polling of Packed Files (#62938). - Fix an OpenMP concurrency-budget issue when building ANN vector indexes (#61313). ## Compute-Storage Decoupling (Cloud Mode) - Fix several correctness issues during Schema Change: delete local Rowsets before `add_rowsets`, fill Version holes before running, normalize the Rowset graph before Delete Bitmap Capture, and add an `SC_COMPACTION_CONFLICT` error code to retry failures across V1 Compaction (#62256, #63443, #63981, #62272). - Fix several Cache Warmup issues: lack of Packed File support, no retry after errors, NPEs on cancellation/expiration, and cache cleanup (#60375, #62886, #62805, #62839, #62941, #62931). - Fix a `balanced_tablets_shards` memory leak and a Warmup inflight counting issue (#59093, #60480). - Fix Recycler OOM caused by too many queued deletion tasks (#59331). - Fix a race condition between the first dynamic-partition initialization and a concurrent `CREATE MATERIALIZED VIEW` (#62755). - Fix exposing the internal `KV_TXN_MAYBE_COMMITTED` state to clients (#62244). - Fix Meta Service logs not being printed in cloud mode (#61766). - Fix `SHOW PROC` not showing the Cached Version of partitions (#60807). ## Lakehouse & External Data Sources - Fix loss of predicate filtering when Native and JNI Readers are mixed in FileScanner (#61802). - Fix a timezone offset issue for the Hive DATE type in external Readers (#61330). - Fix loss of Query TVF column aliases across JDBC Catalogs (#61939). - Fix incompatibility between `SHOW PARTITIONS` and the partitions TVF for external Catalogs (#62134). - Fix an out-of-bounds crash when generating partition columns for Iceberg / Paimon tables (#62177). - Fix several Parquet read/write issues: int96 Timestamp writing, Page V2 encoding, and conditional checks (#61832, #63779, #63305, #63509). - Fix preloading Jindo for the Paimon Scanner (#62351). - Fix a TVF returning an error when the Thrift message exceeds the size limit (#61788). ## File Cache & IO - Fix a SIGSEGV crash caused by background LRU updates during cache cleanup (#60533). - Fix temporary TTL expiration time being incorrectly anchored to the Tablet creation time (#62287). - Fix the file handle cache Key not including the HDFS connection, which could cause connection mismatches (#63516). ## Metadata & Job System - Fix `CANCEL ALTER` with an empty job list being mistakenly treated as canceling all Rollup jobs (#62712). - Fix a sessionVariables null pointer after an upgrade (#61959). - Fix `RestoreCommand` throwing an UnsupportedOperationException (#61890). - Fix Binlog files still being linked when Binlog is not enabled (#61949). - Fix auto-partition tables with `retention_count` not being correctly registered with the scheduler after a restart (#61954). - Fix a Host mismatch when starting FE in `metadata_failure_recovery` mode (#62748). - Fix being unable to run `SHOW TABLET` when no database is selected (#63280). - Fix inconsistent column ordering in `SHOW BACKENDS` (#62207). - Fix system tables returning unknown statistics (#62913). - Fix Audit Log issues: UnboundAlias digest calculation, and internal query failures being mislabeled as ERR (#62160, #62997). - Fix fe.out not being generated when using `start_fe.sh --console` (#61807). - Fix the error message for `INSERT OVERWRITE` (#62555). ## Security & Authentication - Fix Ranger column-level privileges being bypassed when combined with CTEs (#61741). - Fix client IP authentication for Arrow Flight (#63506). - Redact sensitive Headers in Stream Load logs (#62108). ## Stability & Misc - Fix BE crashes caused by Runtime Filters with a shared Hash table (#63257). - Fix a core dump caused by the inline static empty message in arrow::Status (#63191). - Fix Arrow UTF8 / String size limit issues (#63137). - Fix PartitionRebalancer generating illegal migrations to BEs that lack the required storage medium (#62206). - Fix `jetty_server_max_http_header_size` not taking effect under Jetty 12 (#61197). - Fix Prepared Statement QPS metrics not being counted when Audit Log is disabled (#61621). - Fix the partition-near-limit metric by changing it from a Counter to a Gauge (#61845). - Fix loading of the JNI log4j2 configuration (#63063). - Track IO-layer read buffers through MemTrackerLimiter to improve memory observability (#62288). > These notes focus on user-visible and operations-visible changes, and omit some purely-internal refactoring and fixes with no external symptoms. For the complete commit history, refer to the PR list for 4.0.6 in the apache/doris repository on GitHub. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
