E-HO opened a new issue #10492: URL: https://github.com/apache/arrow/issues/10492
Hi, Cannot submit a bug since it's not especially a direct issue but it's more something not complete or up to date in the documentation and especially https://arrow.apache.org/docs/python/parquet.html#reading-a-parquet-file-from-azure-blob-storage . Maybe it could be possible to add some improvements ? - The chapter "Writing to Partitioned Datasets" still presents a "solution" with "hdfs.connect" but since it's mentioned as deprecated no more a good idea to mention it. - The chapter "Reading a Parquet File from Azure Blob storage" is based on the package "azure.storage.blob" ... but an old one and the actual "azure-sdk-for-python" doesn't have any-more methods like get_blob_to_stream(). Possible to update this part with new blob storage possibilities, and also another mentioning the same concept with Delta Lake (similar principle but since there are differences ...) - There is a chapter for "Reading from Partitioned Datasets", that's great ... but works only with a local storage and adding a Data Lake URL to a recursive folder don't work, missing the ability to read partitioned parquet files from Cloud Thanks, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org