[ https://issues.apache.org/jira/browse/ARROW-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris Van den Bossche resolved ARROW-8201. ------------------------------------------ Fix Version/s: 10.0.0 Resolution: Fixed Issue resolved by pull request 14301 [https://github.com/apache/arrow/pull/14301] > [Python][Dataset] Improve ergonomics of FileFragment > ---------------------------------------------------- > > Key: ARROW-8201 > URL: https://issues.apache.org/jira/browse/ARROW-8201 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python > Affects Versions: 0.16.0 > Reporter: Ben Kietzman > Assignee: Miles Granger > Priority: Major > Labels: dataset, pull-request-available > Fix For: 10.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > FileFragment can be made more directly useful by adding convenience methods. > For example, a FileFragment could allow underlying file/buffer to be opened > directly: > {code} > def open(self): > """ > Open a NativeFile of the buffer or file viewed by this fragment. > """ > cdef: > CFileSystem* c_filesystem > shared_ptr[CRandomAccessFile] opened > NativeFile out = NativeFile() > buf = self.buffer > if buf is not None: > return pa.io.BufferReader(buf) > with nogil: > c_filesystem = self.file_fragment.source().filesystem() > opened = GetResultValue(c_filesystem.OpenInputFile( > self.file_fragment.source().path())) > out.set_random_access_file(opened) > out.is_readable = True > return out > {code} > Additionally, a ParquetFileFragment's metadata could be introspectable: > {code} > @property > def metadata(self): > from pyarrow._parquet import ParquetReader > reader = ParquetReader() > reader.open(self.open()) > return reader.metadata > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)