This is an automated email from the ASF dual-hosted git repository. ephraimanierobi pushed a commit to branch v2-8-test in repository https://gitbox.apache.org/repos/asf/airflow.git
commit ff104018545582eab9d57f86705b3844e1429c06 Author: Kenten Danas <[email protected]> AuthorDate: Tue Jan 23 13:42:31 2024 -0800 Update Objectstore tutorial with prereqs section (#36983) * Add prerequisites section to object storage tutorial * Fix trailing whitespace (cherry picked from commit 891b9bcc690623fff364376e165e2ccade855ab1) --- docs/apache-airflow/tutorial/objectstorage.rst | 28 ++++++++++++++++---------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/docs/apache-airflow/tutorial/objectstorage.rst b/docs/apache-airflow/tutorial/objectstorage.rst index 610450b931..943e8031a7 100644 --- a/docs/apache-airflow/tutorial/objectstorage.rst +++ b/docs/apache-airflow/tutorial/objectstorage.rst @@ -25,17 +25,23 @@ This tutorial shows how to use the Object Storage API to manage objects that reside on object storage, like S3, gcs and azure blob storage. The API is introduced as part of Airflow 2.8. -The tutorial covers a simple pattern that is often used in data engineering and -data science workflows: accessing a web api, saving and analyzing the result. For the -tutorial to work you will need to have Duck DB installed, which is a in-process -analytical database. You can do this by running ``pip install duckdb``. The tutorial -makes use of S3 Object Storage. This requires that the amazon provider is installed -including ``s3fs`` by running ``pip install apache-airflow-providers-amazon[s3fs]``. -If you would like to use a different storage provider, you can do so by changing the -URL in the ``create_object_storage_path`` function to the appropriate URL for your -provider, for example by replacing ``s3://`` with ``gs://`` for Google Cloud Storage. -You will also need the right provider to be installed then. Finally, you will need -``pandas``, which can be installed by running ``pip install pandas``. +The tutorial covers a simple pattern that is often used in data engineering and data +science workflows: accessing a web api, saving and analyzing the result. + +Prerequisites +------------- +To complete this tutorial, you need a few things: + +- DuckDB, an in-process analytical database, + which can be installed by running ``pip install duckdb``. +- An S3 bucket, along with the Amazon provider including ``s3fs``. You can install + the provider package by running + ``pip install apache-airflow-providers-amazon[s3fs]``. + Alternatively, you can use a different storage provider by changing the URL in + the ``create_object_storage_path`` function to the appropriate URL for your + provider, for example by replacing ``s3://`` with ``gs://`` for Google Cloud + Storage, and installing a different provider. +- ``pandas``, which you can install by running ``pip install pandas``. Creating an ObjectStoragePath
