[jira] [Commented] (ARROW-2066) [Python] Document reading Parquet files from Azure Blob Store

ASF GitHub Bot (JIRA) Thu, 22 Feb 2018 09:15:56 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373078#comment-16373078
 ]


ASF GitHub Bot commented on ARROW-2066:
---------------------------------------

rjrussell77 commented on a change in pull request #1544: ARROW-2066: [Python] 
Document using pyarrow with Azure Blob Store
URL: https://github.com/apache/arrow/pull/1544#discussion_r170027717
 
 

 ##########
 File path: python/doc/source/parquet.rst
 ##########
 @@ -237,3 +237,43 @@ throughput:
 
    pq.read_table(where, nthreads=4)
    pq.ParquetDataset(where).read(nthreads=4)
+
+Reading a Parquet File from Azure Blob storage
+----------------------------------------------
+
+The code below shows how to use Azure's storage sdk along with pyarrow to read
+a parquet file into a Pandas dataframe.
+This is suitable for executing inside a Jupyter notebook running on a Python 3
+kernel.
+
+Dependencies: 
+
+* python 3.6.2 
+* azure-storage 0.36.0 
+* pyarrow 0.8.0 
+
+.. code-block:: python
+
+   import pyarrow.parquet as pq
+   import io
+   from azure.storage.blob import BlockBlobService
+
+   account_name = '...'
+   account_key = '...'
+   container_name = '...'
+   parquet_file = 'mysample.parquet'
+
+   block_blob_service = BlockBlobService(account_name=account_name, 
account_key=account_key)
+   try:
+      block_blob_service.get_blob_to_stream(container_name=container_name, 
blob_name=parquet_file, stream=byte_stream)
+      df = pq.read_table(source=byte_stream).to_pandas()
+      # Do work on df ...
+   finally:
+      # Add finally block to ensure closure of the stream
+      byte_stream.close()
+
 
 Review comment:
   @xhochy Ok, I've responded to your last set of feedback.  How are we looking 
now?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Document reading Parquet files from Azure Blob Store
> -------------------------------------------------------------
>
>                 Key: ARROW-2066
>                 URL: https://issues.apache.org/jira/browse/ARROW-2066
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Wes McKinney
>            Assignee: Uwe L. Korn
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> See https://github.com/apache/arrow/issues/1510



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2066) [Python] Document reading Parquet files from Azure Blob Store

Reply via email to