kcyea commented on issue #1510:
URL: https://github.com/apache/arrow/issues/1510#issuecomment-724494811
Can we read a list of parquet files under subdirectory of a container in
Azure blob storage? For example, under container "test_container", there are a
list of files under subdirectory of
"test.parquet/department=HR/date=2020-01-01/"?
Besides that, I had tested the performance on reading a parquet file of
298MB on Azure blob storage from my local machine, I am selecting 3 columns to
return. It took about 2 minutes to return the result. Can this be improve to
less than one minute?
from storefact import get_store
import pyarrow as pa
import pyarrow.parquet as pq
params = {
'account_name': 'test',
'account_key': 'XXXsome_azure_account_keyXXX',
'container': 'my-azure-container',
'create_if_missing': False
}
store = get_store('azure', **params)
reader = store.open(key)
table = pq.read_table(reader, columns=columns).to_pandas()
print(table.head(10))
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]