Christian Beilschmidt created ARROW-8508:
--------------------------------------------
Summary: [Rust] ListBuilder of FixedSizeListBuilder creates wrong
offsets
Key: ARROW-8508
URL: https://issues.apache.org/jira/browse/ARROW-8508
Project: Apache Arrow
Issue Type: Bug
Components: Rust
Affects Versions: 0.16.0
Reporter: Christian Beilschmidt
I created an example of storing multi points with Arrow.
# A coordinate consists of two floats (Float64Builder)
# A multi point consists of one or more coordinates (FixedSizeListBuilder)
# A list of multi points consists of multiple multi points (ListBuilder)
This is the corresponding code snippet:
{code:java}
let float_builder = arrow::array::Float64Builder::new(0);
let coordinate_builder = arrow::array::FixedSizeListBuilder::new(float_builder,
2);
let mut multi_point_builder =
arrow::array::ListBuilder::new(coordinate_builder);
multi_point_builder
.values()
.values()
.append_slice(&[0.0, 0.1])
.unwrap();
multi_point_builder.values().append(true).unwrap();
multi_point_builder
.values()
.values()
.append_slice(&[1.0, 1.1])
.unwrap();
multi_point_builder.values().append(true).unwrap();
multi_point_builder.append(true).unwrap(); // first multi point
multi_point_builder
.values()
.values()
.append_slice(&[2.0, 2.1])
.unwrap();
multi_point_builder.values().append(true).unwrap();
multi_point_builder
.values()
.values()
.append_slice(&[3.0, 3.1])
.unwrap();
multi_point_builder.values().append(true).unwrap();
multi_point_builder
.values()
.values()
.append_slice(&[4.0, 4.1])
.unwrap();
multi_point_builder.values().append(true).unwrap();
multi_point_builder.append(true).unwrap(); // second multi point
let multi_point = dbg!(multi_point_builder.finish());
let first_multi_point_ref = multi_point.value(0);
let first_multi_point: &arrow::array::FixedSizeListArray =
first_multi_point_ref.as_any().downcast_ref().unwrap();
let coordinates_ref = first_multi_point.values();
let coordinates: &Float64Array =
coordinates_ref.as_any().downcast_ref().unwrap();
assert_eq!(coordinates.value_slice(0, 2 * 2), &[0.0, 0.1, 1.0, 1.1]);
let second_multi_point_ref = multi_point.value(1);
let second_multi_point: &arrow::array::FixedSizeListArray =
second_multi_point_ref.as_any().downcast_ref().unwrap();
let coordinates_ref = second_multi_point.values();
let coordinates: &Float64Array =
coordinates_ref.as_any().downcast_ref().unwrap();
assert_eq!(coordinates.value_slice(0, 2 * 3), &[2.0, 2.1, 3.0, 3.1, 4.0, 4.1]);
{code}
The second assertion fails and the output is {{[0.0, 0.1, 1.0, 1.1, 2.0, 2.1]}}.
Moreover, the debug output produced from {{dbg!}} confirms this:
{noformat}
[
FixedSizeListArray<2>
[
PrimitiveArray<Float64>
[
0.0,
0.1,
],
PrimitiveArray<Float64>
[
1.0,
1.1,
],
],
FixedSizeListArray<2>
[
PrimitiveArray<Float64>
[
0.0,
0.1,
],
PrimitiveArray<Float64>
[
1.0,
1.1,
],
PrimitiveArray<Float64>
[
2.0,
2.1,
],
],
]{noformat}
The second list should contain the values 2-4.
So either I am using the builder wrong or there is a bug with the offsets. I
used {{0.16}} as well as the current {{master}} from GitHub.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)