This is an automated email from the ASF dual-hosted git repository.
joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
The following commit(s) were added to refs/heads/master by this push:
new 5f06f4743 IMPALA-13934: Do quick pointer comparison in
IcebergDeleteBuilder
5f06f4743 is described below
commit 5f06f4743007bda13d1d45c2a16adab472e7ba23
Author: Zoltan Borok-Nagy <[email protected]>
AuthorDate: Thu Apr 3 21:12:51 2025 +0200
IMPALA-13934: Do quick pointer comparison in IcebergDeleteBuilder
Since IMPALA-13194 file paths are deduplicated in the serialized
position delete records. Therefore we can do a quick check pointer-based
comparison of subsequent position delete records instead of the costly
string compare.
If the pointers don't match we still need to check the strings for
equality because position records coming from different senders can be
coalesced into a single row batch by the EXCHANGE RECEIVER.
Measurements
Data table had ~1 Trillion data records and ~68 Billion position delete
records. Average time spent in the IcebergDeleteBuilder:
+------------+----------+-----------+
| Node count | Original | Optimized |
+------------+----------+-----------+
| 5 | 12m11s | 9m47s |
| 10 | 6m2s | 5m |
| 20 | 3m1s | 2m30s |
| 40 | 1m30s | 1m15s |
+------------+----------+-----------+
It's essential to optimize the builder as it blocks all the probe
threads of the IcebergDeleteNode.
Testing
* no behaviour change, existing tests can be used
Change-Id: Ie171f912a5518b6e6a445efba9d39748ecec5a36
Reviewed-on: http://gerrit.cloudera.org:8080/22737
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
be/src/exec/iceberg-delete-builder.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/be/src/exec/iceberg-delete-builder.cc
b/be/src/exec/iceberg-delete-builder.cc
index fc4833ace..510d350fc 100644
--- a/be/src/exec/iceberg-delete-builder.cc
+++ b/be/src/exec/iceberg-delete-builder.cc
@@ -298,7 +298,7 @@ Status
IcebergDeleteBuilder::ProcessBuildBatch(RuntimeState* state,
file_path = build_row->GetTuple(0)->GetStringSlot(file_path_offset_);
pos = *build_row->GetTuple(0)->GetBigIntSlot(pos_offset_);
- if (*file_path == prev_file_path) {
+ if (file_path->Ptr() == prev_file_path.Ptr() || *file_path ==
prev_file_path) {
pos_buffer.push_back(pos);
} else {
RETURN_IF_ERROR(AddToDeletedRows(prev_file_path, pos_buffer));