lidavidm commented on a change in pull request #11704:
URL: https://github.com/apache/arrow/pull/11704#discussion_r750282835
##########
File path: cpp/src/arrow/dataset/file_csv.cc
##########
@@ -111,9 +111,26 @@ static inline Result<csv::ConvertOptions>
GetConvertOptions(
if (!scan_options) return convert_options;
- auto materialized = scan_options->MaterializedFields();
- std::unordered_set<std::string> materialized_fields(materialized.begin(),
- materialized.end());
+ auto field_refs = scan_options->MaterializedFields();
+ std::unordered_set<std::string> materialized_fields;
+ materialized_fields.reserve(field_refs.size());
+ // Preprocess field refs. We try to avoid FieldRef::GetFoo here since that's
+ // quadratic (and this is significant overhead with 1000+ columns)
+ for (const auto& ref : field_refs) {
+ if (const auto* name = ref.name()) {
Review comment:
I'll write these out. I used `const auto*` to stress that we're getting
a raw pointer here instead of a reference but it's probably best to be explicit
about the types in that case anyways.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]