lidavidm commented on a change in pull request #9810:
URL: https://github.com/apache/arrow/pull/9810#discussion_r604880136



##########
File path: cpp/examples/arrow/dataset-documentation-example.cc
##########
@@ -217,24 +229,29 @@ std::shared_ptr<arrow::Table> SelectAndProjectDataset(
   auto scan_builder = dataset->NewScan().ValueOrDie();
   std::vector<std::string> names;
   std::vector<ds::Expression> exprs;
+  // Read all the original columns.
   for (const auto& field : dataset->schema()->fields()) {
     names.push_back(field->name());
     exprs.push_back(ds::field_ref(field->name()));
   }
+  // Also derive a new column.
   names.push_back("b_large");
   exprs.push_back(ds::greater(ds::field_ref("b"), ds::literal(1)));
   ABORT_ON_FAILURE(scan_builder->Project(exprs, names));

Review comment:
       Ah, the example here is using Project to define new virtual columns from 
the physical columns, while your snippet is simply selecting a subset of the 
physical columns to read. The former API is a superset of the latter. There's 
an example of what you currently do a little bit prior to this in the file. 
I've added explanatory comments for these examples, though in the prose I think 
it should also be clear.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to