bkietz commented on a change in pull request #9670:
URL: https://github.com/apache/arrow/pull/9670#discussion_r591894657
##########
File path: cpp/src/arrow/dataset/file_base.cc
##########
@@ -137,16 +132,174 @@ std::string FileSystemDataset::ToString() const {
return repr;
}
-Result<FragmentIterator> FileSystemDataset::GetFragmentsImpl(Expression
predicate) {
- FragmentVector fragments;
+namespace {
- for (const auto& fragment : fragments_) {
- ARROW_ASSIGN_OR_RAISE(
- auto simplified,
- SimplifyWithGuarantee(predicate, fragment->partition_expression()));
- if (simplified.IsSatisfiable()) {
- fragments.push_back(fragment);
+// Helper class for efficiently detecting subtrees given fragment partition
expressions.
+// Partition expressions are broken into conjunction members and each member
dictionary
+// encoded to impose a sortable ordering. In addition, subtrees are generated
which span
+// groups of fragments and nested subtrees. After encoding each fragment is
guaranteed to
+// be a descendant of at least one subtree. For example, given fragments in a
+// HivePartitioning with paths:
+//
+// /num=0/al=eh/dat.par
+// /num=0/al=be/dat.par
+// /num=1/al=eh/dat.par
+// /num=1/al=be/dat.par
+//
+// The following subtrees will be introduced:
+//
+// /num=0/
+// /num=0/al=eh/
+// /num=0/al=eh/dat.par
+// /num=0/al=be/
+// /num=0/al=be/dat.par
+// /num=1/
+// /num=1/al=eh/
+// /num=1/al=eh/dat.par
+// /num=1/al=be/
+// /num=1/al=be/dat.par
+struct SubtreeImpl {
+ using expression_code = char32_t;
Review comment:
Also: basic_string has the short string approximation in most standard
libraries (so a string with as many as 4 `expression_code`s will probably be
stored without allocation) and supports hashing out of the box
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]