Weston Pace created ARROW-15658:
-----------------------------------
Summary: [C++] Parquet pushdown filtering fails if the filter
expression uses numeric field references
Key: ARROW-15658
URL: https://issues.apache.org/jira/browse/ARROW-15658
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
We can refer to a field by name (e.g. {{compute::field_ref("foo")}}) or by
index (e.g. {{compute::field_ref(0)}}).
The latter is not supported when doing parquet projection. A test can
demonstrating this can be found here:
https://github.com/westonpace/arrow/commit/2f92ed0764cf2e1388dac053aeb4e1b923c6872e
Copied here for posterity (this would go in the dataset fixture mixin):
{code}
void TestScanWithFieldPathFilter() {
auto i32 = field("i32", int32());
auto i64 = field("i64", int64());
this->opts_->dataset_schema = schema({i32, i64});
this->Project({"i64"});
// This should be the column i32
this->SetFilter(equal(field_ref(0), literal(0)));
auto expected_schema = schema({i64});
auto reader = this->GetRecordBatchReader(opts_->dataset_schema);
auto source = this->GetFileSource(reader.get());
auto fragment = this->MakeFragment(*source);
int64_t row_count = 0;
for (auto maybe_batch : PhysicalBatches(fragment)) {
ASSERT_OK_AND_ASSIGN(auto batch, maybe_batch);
row_count += batch->num_rows();
AssertSchemaEqual(*batch->schema(), *expected_schema,
/*check_metadata=*/false);
}
ASSERT_EQ(row_count, expected_rows());
}
{code}
I would expect this to work. Instead I get the error:
{noformat}
/home/pace/dev/arrow/cpp/src/arrow/dataset/test_util.h:840: Failure
Failed
'_error_or_value83.status()' failed with NotImplemented: Inferring column
projection from FieldRef FieldRef.FieldPath(0)
/home/pace/dev/arrow/cpp/src/arrow/dataset/file_parquet.cc:262
ResolveOneFieldRef(manifest, ref, field_lookup, duplicate_fields,
&columns_selection)
/home/pace/dev/arrow/cpp/src/arrow/dataset/file_parquet.cc:437
InferColumnProjection(*reader, *options)
/home/pace/dev/arrow/cpp/src/arrow/util/iterator.h:152 value_.status()
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)