kcyea commented on issue #1510:
URL: https://github.com/apache/arrow/issues/1510#issuecomment-724494811


   Can we read a list of parquet files under subdirectory of a container in 
Azure blob storage? For example, under container "test_container", there are a 
list of files under subdirectory of 
"test.parquet/department=HR/date=2020-01-01/"?
   
   Besides that, I had tested the performance on reading a parquet file of 
298MB on Azure blob storage from my local machine, I am selecting 3 columns to 
return. It took about 2 minutes to return the result. Can this be improve to 
less than one minute?
   
   from storefact import get_store
   import pyarrow as pa
   import pyarrow.parquet as pq
   
   params = {
       'account_name': 'test',
       'account_key': 'XXXsome_azure_account_keyXXX',
       'container': 'my-azure-container',
      'create_if_missing': False
   }
   store = get_store('azure', **params)
   
   reader = store.open(key) 
   table = pq.read_table(reader, columns=columns).to_pandas()
   print(table.head(10))
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to