[ 
https://issues.apache.org/jira/browse/ARROW-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372967#comment-16372967
 ] 

ASF GitHub Bot commented on ARROW-2066:
---------------------------------------

xhochy commented on a change in pull request #1544: ARROW-2066: [Python] 
Document using pyarrow with Azure Blob Store
URL: https://github.com/apache/arrow/pull/1544#discussion_r169998387
 
 

 ##########
 File path: python/doc/source/parquet.rst
 ##########
 @@ -237,3 +237,44 @@ throughput:
 
    pq.read_table(where, nthreads=4)
    pq.ParquetDataset(where).read(nthreads=4)
+
+Reading a Parquet File from Azure Blob storage
+----------------------------------------------
+
+The code below shows how to use Azure's storage sdk along with pyarrow to read
+a parquet file into a Pandas dataframe.
+This is suitable for executing inside a Jupyter notebook running on a Python 3
+kernel.
+
+Dependencies: 
+
+* python 3.6.2 
+* azure-storage 0.36.0 
+* pyarrow 0.8.0 
+
+.. code-block:: python
+
+   import pyarrow.parquet as pq
+   import io
+   from azure.storage.blob import BlockBlobService
+
+   account_name = '...'
+   account_key = '...'
+   container_name = '...'
+   parquet_file = 'mysample.parquet'
+
+   block_blob_service = BlockBlobService(account_name=account_name, 
account_key=account_key)
+   try:
+      block_blob_service.get_blob_to_stream(container_name=container_name, 
blob_name=parquet_file, stream=byte_stream)
+      pd = pq.read_table(source=byte_stream).to_pandas()
+      pd.head(10)
 
 Review comment:
   Better replace this with `# Do work on DF …`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Document reading Parquet files from Azure Blob Store
> -------------------------------------------------------------
>
>                 Key: ARROW-2066
>                 URL: https://issues.apache.org/jira/browse/ARROW-2066
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Wes McKinney
>            Assignee: Uwe L. Korn
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> See https://github.com/apache/arrow/issues/1510



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to