westonpace commented on issue #10492:
URL: https://github.com/apache/arrow/issues/10492#issuecomment-857888278


   > Cannot submit a bug since it's not especially a direct issue but it's more 
something not complete or up to date in the documentation
   Please do create a JIRA issue.  Arrow uses JIRA to track all changes (bugs, 
doc change, CI improvements, new features) and so you don't have to worry about 
that.  These sound like valid concerns and a JIRA issue would be acceptable.
   
   > There is a chapter for "Reading from Partitioned Datasets", that's great 
... but works only with a local storage and adding a Data Lake URL to a 
recursive folder don't work, missing the ability to read partitioned parquet 
files from Cloud
   
   That chapter is talking about the legacy datasets API (ParquetDataset).  You 
may be better served reading up on the new datasets API: 
https://arrow.apache.org/docs/python/dataset.html#dataset .  The new API will 
accept a URL as a path although it currently only has first-class support for 
S3 and HDFS.  To use Azure data lake directly you would need to create a 
filesystem for it as the datasets API needs to be able to list files, search 
for files, create files, etc.
   
   That being said, you might be able to make something work by using the 
fsspec filesystem 
(https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems)
 and https://github.com/dask/adlfs .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to