This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new e894a03bea perf: Use Hashbrown for array_distinct (#20538)
e894a03bea is described below
commit e894a03bea638e35677eaf27876966013dd64bf4
Author: Neil Conway <[email protected]>
AuthorDate: Wed Feb 25 13:12:42 2026 -0500
perf: Use Hashbrown for array_distinct (#20538)
## Which issue does this PR close?
N/A
## Rationale for this change
#20364 recently optimized `array_distinct` to use batched row
conversion. As part of that PR, `std::HashSet` was used. This PR just
replaces `std::HashSet` with `hashbrown::HashSet`, which measurably
improves performance.
## What changes are included in this PR?
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No.
---
datafusion/functions-nested/src/set_ops.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/datafusion/functions-nested/src/set_ops.rs
b/datafusion/functions-nested/src/set_ops.rs
index 2348b3c530..150559111f 100644
--- a/datafusion/functions-nested/src/set_ops.rs
+++ b/datafusion/functions-nested/src/set_ops.rs
@@ -34,8 +34,8 @@ use datafusion_expr::{
ColumnarValue, Documentation, ScalarUDFImpl, Signature, Volatility,
};
use datafusion_macros::user_doc;
+use hashbrown::HashSet;
use std::any::Any;
-use std::collections::HashSet;
use std::fmt::{Display, Formatter};
use std::sync::Arc;
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]