marsupialtail commented on code in PR #13830:
URL: https://github.com/apache/arrow/pull/13830#discussion_r944829495
##########
cpp/src/arrow/dataset/file_base.cc:
##########
@@ -89,6 +89,28 @@ Result<std::shared_ptr<io::InputStream>>
FileSource::OpenCompressed(
return io::CompressedInputStream::Make(codec.get(), std::move(file));
}
+Result<std::shared_ptr<io::InputStream>> FileSource::OpenRange(int64_t start,
+ int64_t end)
const {
Review Comment:
I understand your use case to be: you have one Parquet file, you want to
specify a byte range, and you want to read all the row groups that fit in that
byte range.
I think with this API here, you don't need to split a single
ParquetFileFragment into multiple, nor do you need a new C++ function. We can
just incorporate your changes in file_parquet.cc and have the
ParquetFileFragment interpret the start and end bytes in the way that you
specified instead of insisting that they align on proper row group boundaries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]