bkietz commented on a change in pull request #9670: URL: https://github.com/apache/arrow/pull/9670#discussion_r591894657
########## File path: cpp/src/arrow/dataset/file_base.cc ########## @@ -137,16 +132,174 @@ std::string FileSystemDataset::ToString() const { return repr; } -Result<FragmentIterator> FileSystemDataset::GetFragmentsImpl(Expression predicate) { - FragmentVector fragments; +namespace { - for (const auto& fragment : fragments_) { - ARROW_ASSIGN_OR_RAISE( - auto simplified, - SimplifyWithGuarantee(predicate, fragment->partition_expression())); - if (simplified.IsSatisfiable()) { - fragments.push_back(fragment); +// Helper class for efficiently detecting subtrees given fragment partition expressions. +// Partition expressions are broken into conjunction members and each member dictionary +// encoded to impose a sortable ordering. In addition, subtrees are generated which span +// groups of fragments and nested subtrees. After encoding each fragment is guaranteed to +// be a descendant of at least one subtree. For example, given fragments in a +// HivePartitioning with paths: +// +// /num=0/al=eh/dat.par +// /num=0/al=be/dat.par +// /num=1/al=eh/dat.par +// /num=1/al=be/dat.par +// +// The following subtrees will be introduced: +// +// /num=0/ +// /num=0/al=eh/ +// /num=0/al=eh/dat.par +// /num=0/al=be/ +// /num=0/al=be/dat.par +// /num=1/ +// /num=1/al=eh/ +// /num=1/al=eh/dat.par +// /num=1/al=be/ +// /num=1/al=be/dat.par +struct SubtreeImpl { + using expression_code = char32_t; Review comment: Also: basic_string has the short string approximation in most standard libraries (so a string with as many as 4 `expression_code`s will probably be stored without allocation) and supports hashing out of the box ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org