zclllyybb commented on issue #64700: URL: https://github.com/apache/doris/issues/64700#issuecomment-4767954566
Breakwater-GitHub-Analysis-Slot: slot_6612ad2e0ca8 This content is generated by AI for reference only. Initial triage: I cannot confirm a specific Doris 4.1.2 root cause from the pasted application logs alone. The before/after samples are not the same workload shape. In the "before upgrade" section, the shown listener batches are roughly 200-658 source rows and each serialized application cycle is mostly about 2.7-4.7s. In the "after upgrade" section, the shown listener batches are about 895-1095 source rows, and some derived Stream Load calls reach about 1.9k-2.7k rows. Since the application appears to send many Stream Load requests sequentially on one Kafka listener thread, part of the end-to-end increase can come from larger batches plus serialized per-table loads. That said, the evidence is still suspicious for a shared Doris-side bottleneck or environment bottleneck, not just row count. After upgrade, even very small loads such as 1-29 rows often take about 500-1300ms, and `device_inv_realtime` reaches 2.2s, 3.6s, and 6.2s in the pasted samples. That pattern would be consistent with transaction publish delay, backend write/resource pressure, compaction/tablet-version pressure, network/request-body delay, or client-side queuing, but the current logs do not identify which one. Code-side anchor for maintainers: in Doris 4.1.2, the Stream Load BE response and BE INFO log split the latency into `LoadTimeMs`, `BeginTxnTimeMs`, `StreamLoadPutTimeMs`, `ReceiveDataTimeMs`, `ReadDataTimeMs`, `WriteDataTimeMs`, and `CommitAndPublishTimeMs` (`be/src/load/stream_load/stream_load_context.cpp`, plus the `finished to execute stream load` log in `be/src/service/http/action/stream_load.cpp`). Those fields are the key evidence needed here: - If `CommitAndPublishTimeMs` is high, focus on FE transaction publish, tablet version count, compaction backlog, or master FE pressure. - If `WriteDataTimeMs` is high, focus on BE ingestion/storage/write path, schema/index cost, payload size, and resource pressure. - If `ReceiveDataTimeMs` or the gap between client time and `LoadTimeMs` is high, focus on client serialization, HTTP request upload, redirect/network path, or the single Kafka listener thread. - If `LoadTimeMs` is low while the application "Doris Stream Load" duration is high, the bottleneck is likely outside the Doris Stream Load execution itself. Information needed to make this actionable: 1. The exact previous Doris version and whether any table schema, replica count, bucket count, BE/FE count, hardware, network path, or Stream Load client settings changed during the upgrade. 2. Full Stream Load JSON responses before and after upgrade for the same table and comparable payload size, including the timing fields above and the request label. 3. Matching BE logs around `finished to execute stream load. label=...` and FE master logs around begin/commit/publish for several slow labels. 4. DDL for the affected tables, especially whether they are Unique Key/MOW tables, whether partial update or sequence columns are used, indexes, partitions, buckets, and replication settings. 5. Stream Load request headers, especially `group_commit`, `two_phase_commit`, `merge_type`, `columns`, `partial_columns`, and format settings. 6. Cluster state during the slow window: `SHOW BACKENDS`, CPU/IO/network utilization, compaction backlog, tablet version count or tablet health output, and whether other loads/queries were running. 7. A minimal repro or a controlled comparison that sends the same payload to the same table before and after upgrade, preferably with the full server-side Stream Load timing fields. Next suggested maintainer step: ask for the full Stream Load JSON responses and the matching BE/FE logs first. Without those timing fields, this issue should remain open as a possible write-latency regression, but the current public evidence is insufficient to assign it to a concrete Doris 4.1.2 code bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
