bkietz commented on a change in pull request #9532:
URL: https://github.com/apache/arrow/pull/9532#discussion_r583924317
##########
File path: cpp/src/arrow/dataset/expression_test.cc
##########
@@ -1135,5 +1175,36 @@ TEST(Expression, SerializationRoundTrips) {
equal(field_ref("beta"), literal(3.25f))}));
}
+TEST(Projection, AugmentWithNull) {
+ auto just_i32 =
ArrayFromJSON(struct_({kBoringSchema->GetFieldByName("i32")}),
+ R"([{"i32": 0}, {"i32": 1}, {"i32": 2}])");
+
+ {
+ ASSERT_OK_AND_ASSIGN(auto proj, project({field_ref("f64"),
field_ref("i32")},
+ {"projected double", "projected
int"})
+ .Bind(*kBoringSchema));
+
+ auto expected = ArrayFromJSON(
+ struct_({field("projected double", float64()), field("projected int",
int32())}),
+ R"([[null, 0], [null, 1], [null, 2]])");
+ ASSERT_OK_AND_ASSIGN(auto actual, ExecuteScalarExpression(proj, just_i32));
+
+ AssertDatumsEqual(Datum(expected), actual);
+ }
+
+ {
+ ASSERT_OK_AND_ASSIGN(
+ auto proj,
+ project({field_ref("f64")}, {"projected
double"}).Bind(*kBoringSchema));
+
+ // NB: only a scalar was projected, this is *not* automatically broadcast
to an array.
+ ASSERT_OK_AND_ASSIGN(auto expected,
StructScalar::Make({MakeNullScalar(float64())},
Review comment:
Ah, I see your concern. Individual calls to project do not broadcast
scalars in case subsequent steps in the pipeline want to do something more
efficient. FilterAndProjectScanTask broadcasts scalars to the correct length
before yielding the batch:
https://github.com/apache/arrow/pull/9532/files?file-filters%5B%5D=.cc&file-filters%5B%5D=.h&file-filters%5B%5D=.java&file-filters%5B%5D=.pxd&file-filters%5B%5D=.py#diff-25b1bd283e8242f8384b24a0f1e8b61fbca0c2784ab679f9a2a00b03450487aaR72-R76
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]