This is an automated email from the ASF dual-hosted git repository.
jorgecarleitao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 5228ede ARROW-11086: [Rust] Extend take implementation to more index
types
5228ede is described below
commit 5228ede9abf8ecf9b4bb68a06075cd16af3523e9
Author: Heres, Daniel <[email protected]>
AuthorDate: Sat Jan 2 23:28:03 2021 +0000
ARROW-11086: [Rust] Extend take implementation to more index types
## Context
The context of this PR is that I want to experiment with a simplified
implementation of the hash join in DataFusion which directly can index the
build-side array instead of keeping a list of batches. This array could grow
beyond 2 ^ 32 billion elements, so would need indexes of type `UInt64` rather
than `UInt32`.
## Implementation
In the PR I just extend the public `take` to take any `IndexType` which
implements `ArrowNumericType` and `ToPrimitive`.
I am not sure about the consideration before to restrict `take` to only
`UInt32Array`.
Closes #9057 from Dandandan/take_index
Authored-by: Heres, Daniel <[email protected]>
Signed-off-by: Jorge C. Leitao <[email protected]>
---
rust/arrow/src/compute/kernels/take.rs | 12 ++++++++----
rust/datafusion/src/physical_plan/hash_join.rs | 3 ++-
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/rust/arrow/src/compute/kernels/take.rs
b/rust/arrow/src/compute/kernels/take.rs
index c8a2a02..85567ad 100644
--- a/rust/arrow/src/compute/kernels/take.rs
+++ b/rust/arrow/src/compute/kernels/take.rs
@@ -76,12 +76,16 @@ macro_rules! downcast_dict_take {
/// # Ok(())
/// # }
/// ```
-pub fn take(
+pub fn take<IndexType>(
values: &Array,
- indices: &UInt32Array,
+ indices: &PrimitiveArray<IndexType>,
options: Option<TakeOptions>,
-) -> Result<ArrayRef> {
- take_impl::<UInt32Type>(values, indices, options)
+) -> Result<ArrayRef>
+where
+ IndexType: ArrowNumericType,
+ IndexType::Native: ToPrimitive,
+{
+ take_impl(values, indices, options)
}
fn take_impl<IndexType>(
diff --git a/rust/datafusion/src/physical_plan/hash_join.rs
b/rust/datafusion/src/physical_plan/hash_join.rs
index a9d5963..50800df 100644
--- a/rust/datafusion/src/physical_plan/hash_join.rs
+++ b/rust/datafusion/src/physical_plan/hash_join.rs
@@ -316,7 +316,8 @@ fn build_batch_from_indices(
// 2. based on the pick, `take` items from the different recordBatches
let mut columns: Vec<Arc<dyn Array>> =
Vec::with_capacity(schema.fields().len());
- let right_indices = indices.iter().map(|(_, join_index)|
join_index).collect();
+ let right_indices: UInt32Array =
+ indices.iter().map(|(_, join_index)| join_index).collect();
for field in schema.fields() {
// pick the column (left or right) based on the field name.