aiqc opened a new issue #9932: URL: https://github.com/apache/arrow/issues/9932
> We have GitHub issues available as a way for new contributors and passers-by who are unfamiliar with Apache Software Foundation projects to ask questions and interact with the project. Do not be surprised if the first response is to open a JIRA issue or to write an e-mail to one of the public mailing lists: Hi there. Is there a way to read a Parquet file by way of row (aka index) range? Not seeing it in `pyarrow.parquet.read_table` and there are questions about it: - https://stackoverflow.com/questions/64050609/pyarrow-read-parquet-via-column-index-or-order - https://stackoverflow.com/questions/62252259/pandas-read-write-parquet-data-using-column-index Right now I just read the whole file in and then drop rows, which won't be feasible on larger datasets. ``` df = pd.read_parquet(my_stream) df = df.iloc[samples_indices] ``` I feel like I could do this with Spark, but don't want to add that dependency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
