[GitHub] [arrow] E-HO opened a new issue #10492: Doc update ? For Reading and Writing the Apache Parquet Format

GitBox Wed, 09 Jun 2021 03:47:13 -0700


E-HO opened a new issue #10492:
URL: https://github.com/apache/arrow/issues/10492



   Hi,
   
   Cannot submit a bug since it's not especially a direct issue but it's more 
something not complete or up to date in the documentation and especially 
https://arrow.apache.org/docs/python/parquet.html#reading-a-parquet-file-from-azure-blob-storage
 .
   
   Maybe it could be possible to add some improvements ?
   
   - The chapter "Writing to Partitioned Datasets" still presents a "solution" 
with "hdfs.connect" but since it's mentioned as deprecated no more a good idea 
to mention it.
   - The chapter "Reading a Parquet File from Azure Blob storage" is based on 
the package "azure.storage.blob" ... but an old one and the actual 
"azure-sdk-for-python" doesn't have any-more methods like get_blob_to_stream(). 
Possible to update this part with new blob storage possibilities, and also 
another mentioning the same concept with Delta Lake (similar principle but 
since there are differences ...)
   - There is a chapter for "Reading from Partitioned Datasets", that's great 
... but works only with a local storage and adding a Data Lake URL to a 
recursive folder don't work, missing the ability to read partitioned parquet 
files from Cloud
   
   Thanks,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] E-HO opened a new issue #10492: Doc update ? For Reading and Writing the Apache Parquet Format

Reply via email to