sanjibansg commented on code in PR #12977:
URL: https://github.com/apache/arrow/pull/12977#discussion_r859106787
##########
cpp/src/arrow/dataset/discovery.cc:
##########
@@ -278,8 +278,13 @@ Result<std::shared_ptr<Dataset>>
FileSystemDatasetFactory::Finish(FinishOptions
}
std::vector<std::shared_ptr<FileFragment>> fragments;
+ std::string fixed_path;
for (const auto& info : files_) {
- auto fixed_path = StripPrefixAndFilename(info.path(),
options_.partition_base_dir);
+ if (partitioning->type_name() == "filename") {
+ fixed_path = StripPrefix(info.path(), options_.partition_base_dir);
+ } else {
+ fixed_path = StripPrefixAndFilename(info.path(),
options_.partition_base_dir);
+ }
Review Comment:
With the latest change, I modified the `StripPrefixAndFilename()` method to
return a `PartitionPathFormat` object which will contain both the directory and
filename prefix and then passing that to the `Parse()` method which now expects
both the directory and filename-prefix.
We can modify the `Parse()` method as well to accept an object of
`PartitionPathFormat` that way it will be symmetrical to the `Format()` method.
But then, we need to implement similar changes to PyArrow, and I believe then
we have to define an object of `PartitionPathFormat` first to use the
`partitioning.parse()` method in PyArrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]