jorisvandenbossche commented on code in PR #19706:
URL: https://github.com/apache/arrow/pull/19706#discussion_r1073391100


##########
r/src/expression.cpp:
##########
@@ -46,13 +46,26 @@ std::shared_ptr<compute::Expression> 
compute___expr__call(std::string func_name,
       compute::call(std::move(func_name), std::move(arguments), 
std::move(options_ptr)));
 }
 
+// [[arrow::export]]
+bool compute___expr__is_field_ref(const std::shared_ptr<compute::Expression>& 
x) {
+  return x->field_ref() != nullptr;
+}
+
 // [[arrow::export]]
 std::vector<std::string> field_names_in_expression(
     const std::shared_ptr<compute::Expression>& x) {
   std::vector<std::string> out;
+  std::vector<arrow::FieldRef> nested;
+
   auto field_refs = FieldsInExpression(*x);
   for (auto f : field_refs) {
-    out.push_back(*f.name());
+    if (f.IsNested()) {
+      // We keep the top-level field name.

Review Comment:
   You can also specify field refs (well, generic expressions), but then you 
also need to pass the resulting name for the schema. See the second Project 
signature at 
   
   
https://github.com/apache/arrow/blob/4e439f6a597180c5fc8ff1552c860cecd33736c5/cpp/src/arrow/dataset/scanner.h#L463-L484
   
   which gets translated to ScanOptions.projection. It seems that is also what 
the R bindings actually do inside `ExecNode_Scan` (it will convert the 
materialized_field_names back to FieldRefs). Now, the scanner itself will also 
just use the top-level name of a nested field ref to do pruning of what it 
needs to read, so right now preserving the nested field ref is not useful. But 
ideally in the future we would optimize that for formats that can do that (like 
parquet, cfr https://github.com/apache/arrow/issues/33167)
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to