mongo360 opened a new pull request, #23679:
URL: https://github.com/apache/doris/pull/23679
Problem:
the stream load record result of _show stream load_ command is wrong some
times when one stream load transaction use the label same with the last abort
transaction
Example:
1. create test table
> CREATE TABLE `test_table` ( `id` bigint(20) NOT NULL, `create_day` date
NOT NULL, `line` bigint(20) SUM NULL DEFAULT "0" ) ENGINE = OLAP AGGREGATE
KEY(`id`, `create_day`) COMMENT 'olap' PARTITION BY RANGE(`create_day`) (
PARTITION pbefroe202308 VALUES [('0000-01-01'), ('2023-08-01')), PARTITION
p202308 VALUES [('2023-08-01'), ('2023-09-01')), PARTITION p202309 VALUES
[('2023-09-01'), ('2023-10-01')), PARTITION p202310 VALUES [('2023-10-01'),
('2023-11-01'))) DISTRIBUTED BY HASH(`id`) BUCKETS 16 PROPERTIES (
"replication_allocation" = "tag.location.default: 2",
"dynamic_partition.enable" = "true", "dynamic_partition.time_unit" = "month",
"dynamic_partition.time_zone" = "Asia/Shanghai", "dynamic_partition.start" =
"-2147483648", "dynamic_partition.end" = "5", "dynamic_partition.prefix" = "p",
"dynamic_partition.replication_allocation" = "tag.location.default: 2",
"dynamic_partition.buckets" = "64",
"dynamic_partition.create_history_partition" = "false",
"dynamic_partition.history_
partition_num" = "-1", "dynamic_partition.hot_partition_num" = "0",
"dynamic_partition.reserved_history_periods" = "NULL",
"dynamic_partition.storage_policy" = "", "dynamic_partition.storage_medium" =
"HDD", "dynamic_partition.start_day_of_month" = "1", "in_memory" = "false",
"storage_format" = "V2", "disable_auto_compaction" = "false" );
2. create empty csv file for abort transaction
> /tmp/abort.txt
3. create csv file for commit transaction
> /tmp/commit.txt
3000,2023-08-10,1
3001,2023-08-10,2
3002,2023-08-10,3
4. create first 2pc stream load transaction and abort it
> curl --location-trusted -u admin:pwd -H "label:test_label_01" -H
"column_separator:," -H "format:csv" -H "two_phase_commit:true" -T
/tmp/abort.txt http://127.0.0.1:8030/api/db/test_table/_stream_load"
curl -X PUT --location-trusted -u admin:pwd -H "txn_id:1001" -H
"txn_operation:abort" http://127.0.0.1:8030/api/db/_stream_load_2pc"
5. create second 2pc stream load txn with same label of the first txn and
commit it
> curl --location-trusted -u admin:pwd -H "label:test_label_01" -H
"column_separator:," -H "format:csv" -H "two_phase_commit:true" -T
/tmp/commit.txt http://127.0.0.1:8030/api/db/test_table/_stream_load"
curl -X PUT --location-trusted -u admin:pwd -H "txn_id:1002" -H
"txn_operation:commit" http://127.0.0.1:8030/api/db/_stream_load_2pc"
6. show stream load
>
+---------------------------------+------+------------------------------------+-------+-------------+---------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------+--------------+----------------+-----------+-------------------------+-------------------------+
| Label | Db | Table
| User | ClientIp | Status | Message | Url
| TotalRows | LoadedRows | FilteredRows |
UnselectedRows | LoadBytes | StartTime | FinishTime |
+---------------------------------+------+------------------------------------+-------+-------------+---------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+------------+--------------+----------------+-----------+-------------------------+-------------------------+
| test_label_01 | db | test_table | admin | 11.10.0.1 | Success | OK
| N/A
| 0
| 0 | 0 | 0 | 0 | 2023-08-29
10:00:00.617 | 2023-08-29 10:00:00.817 |
the result is the first abort transaction, we really want the result is the
second valid transaction;
Reason:
The aborted label can be reused, in function
DatabaseTransactionMgr::beginTransaction
but when get stream load record from be in
StreamLoadRecordMgr::runAfterCatalogReady we will get two stream load record
with the two transaction with same label. but when add record with function
StreamLoadRecordMgr::addStreamLoadRecord just use label to check if the label
exists already.
> if (!labelToStreamLoadRecord.containsKey(label)) {
labelToStreamLoadRecord.put(label, streamLoadRecord);
}
so if the record of the first abort transaction is add into the map,the
record of the second transaction with be give up
Solved:
when the label exists already, check if the finish time of current
transaction is large than the transaction exist in the map. replace it with
current transaction record.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]