[beam] branch master updated: Minor: Add more links to DataFrame API documentation (#15661)

bhulette Fri, 15 Oct 2021 09:34:10 -0700

This is an automated email from the ASF dual-hosted git repository.

bhulette pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git



The following commit(s) were added to refs/heads/master by this push:
     new fdf0636  Minor: Add more links to DataFrame API documentation (#15661)
fdf0636 is described below

commit fdf06361cd609335dc6c9763fb09f4e6b3e29e36
Author: Brian Hulette <[email protected]>
AuthorDate: Fri Oct 15 09:32:34 2021 -0700

    Minor: Add more links to DataFrame API documentation (#15661)
    
    * Add links to API documentation
    
    * Pandas -> pandas, explicitly link to DataFrame examples
    
    * Drop 'standard'
---
 .../en/documentation/dsls/dataframes/overview.md    | 21 +++++++++++++--------
 .../site/layouts/partials/section-menu/en/sdks.html |  5 ++++-
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git 
a/website/www/site/content/en/documentation/dsls/dataframes/overview.md 
b/website/www/site/content/en/documentation/dsls/dataframes/overview.md
index 0620da4..c2c9f8f 100644
--- a/website/www/site/content/en/documentation/dsls/dataframes/overview.md
+++ b/website/www/site/content/en/documentation/dsls/dataframes/overview.md
@@ -54,15 +54,15 @@ with beam.Pipeline() as p:
 
 pandas is able to infer column names from the first row of the CSV data, which 
is where `passenger_count` and `DOLocationID` come from.
 
-In this example, the only traditional Beam type is the `Pipeline` instance. 
Otherwise the example is written completely with the DataFrame API. This is 
possible because the Beam DataFrame API includes its own IO operations (for 
example, `read_csv` and `to_csv`) based on the pandas native implementations. 
`read_*` and `to_*` operations support file patterns and any Beam-compatible 
file system. The grouping is accomplished with a group-by-key, and arbitrary 
pandas operations (in this case, [...]
+In this example, the only traditional Beam type is the `Pipeline` instance. 
Otherwise the example is written completely with the DataFrame API. This is 
possible because the Beam DataFrame API includes its own IO operations (for 
example, [`read_csv`][pydoc_read_csv] and [`to_csv`][pydoc_to_csv]) based on 
the pandas native implementations. `read_*` and `to_*` operations support file 
patterns and any Beam-compatible file system. The grouping is accomplished with 
a group-by-key, and arbitrar [...]
 
-The Beam DataFrame API aims to be compatible with the native pandas 
implementation, with a few caveats detailed below in [Differences from standard 
pandas](/documentation/dsls/dataframes/differences-from-pandas/).
+The Beam DataFrame API aims to be compatible with the native pandas 
implementation, with a few caveats detailed below in [Differences from 
pandas](/documentation/dsls/dataframes/differences-from-pandas/).
 
 ## Embedding DataFrames in a pipeline
 
 To use the DataFrames API in a larger pipeline, you can convert a PCollection 
to a DataFrame, process the DataFrame, and then convert the DataFrame back to a 
PCollection. In order to convert a PCollection to a DataFrame and back, you 
have to use PCollections that have 
[schemas](https://beam.apache.org/documentation/programming-guide/#what-is-a-schema)
 attached. A PCollection with a schema attached is also referred to as a 
*schema-aware PCollection*. To learn more about attaching a schema [...]
 
-Here’s an example that creates a schema-aware PCollection, converts it to a 
DataFrame using `to_dataframe`, processes the DataFrame, and then converts the 
DataFrame back to a PCollection using `to_pcollection`:
+Here’s an example that creates a schema-aware PCollection, converts it to a 
DataFrame using [`to_dataframe`][pydoc_to_dataframe], processes the DataFrame, 
and then converts the DataFrame back to a PCollection using 
[`to_pcollection`][pydoc_to_pcollection]:
 
 <!-- TODO(BEAM-11480): Convert these examples to snippets -->
 {{< highlight py >}}
@@ -96,7 +96,7 @@ You can find the full wordcount example on
 
[GitHub](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/wordcount.py),
 along with other [example DataFrame 
pipelines](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/dataframe/).
 
-It’s also possible to use the DataFrame API by passing a function to 
[`DataframeTransform`][pydoc_dataframe_transform]:
+It’s also possible to use the DataFrame API by passing a function to 
[`DataframeTransform`][pydoc_DataframeTransform]:
 
 {{< highlight py >}}
 from apache_beam.dataframe.transforms import DataframeTransform
@@ -110,9 +110,9 @@ with beam.Pipeline() as p:
   ...
 {{< /highlight >}}
 
-[`DataframeTransform`][pydoc_dataframe_transform] is similar to 
[`SqlTransform`][pydoc_sql_transform] from the [Beam 
SQL](https://beam.apache.org/documentation/dsls/sql/overview/) DSL. Where 
`SqlTransform` translates a SQL query to a PTransform, `DataframeTransform` is 
a PTransform that applies a function that takes and returns DataFrames. A 
`DataframeTransform` can be particularly useful if you have a stand-alone 
function that can be called both on Beam and on ordinary pandas DataFrames.
+[`DataframeTransform`][pydoc_DataframeTransform] is similar to 
[`SqlTransform`][pydoc_SqlTransform] from the [Beam 
SQL](https://beam.apache.org/documentation/dsls/sql/overview/) DSL. Where 
[`SqlTransform`][pydoc_SqlTransform] translates a SQL query to a PTransform, 
[`DataframeTransform`][pydoc_DataframeTransform] is a PTransform that applies a 
function that takes and returns DataFrames. A 
[`DataframeTransform`][pydoc_DataframeTransform] can be particularly useful if 
you have a stand-alon [...]
 
-`DataframeTransform` can accept and return multiple PCollections by name and 
by keyword, as shown in the following examples:
+[`DataframeTransform`][pydoc_DataframeTransform] can accept and return 
multiple PCollections by name and by keyword, as shown in the following 
examples:
 
 {{< highlight py >}}
 output = (pc1, pc2) | DataframeTransform(lambda df1, df2: ...)
@@ -124,7 +124,12 @@ pc1, pc2 = {'a': pc} | DataframeTransform(lambda a: expr1, 
expr2)
 {...} = {a: pc} | DataframeTransform(lambda a: {...})
 {{< /highlight >}}
 
-[pydoc_dataframe_transform]: 
https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.transforms.html#apache_beam.dataframe.transforms.DataframeTransform
-[pydoc_sql_transform]: 
https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.sql.html#apache_beam.transforms.sql.SqlTransform
+[pydoc_read_csv]: 
https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.io.html#apache_beam.dataframe.io.read_csv
+[pydoc_to_csv]: 
https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.frames.html#apache_beam.dataframe.frames.DeferredDataFrame.to_csv
+[pydoc_sum]: 
https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.frames.html#apache_beam.dataframe.frames.DeferredDataFrame.sum
+[pydoc_DataframeTransform]: 
https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.transforms.html#apache_beam.dataframe.transforms.DataframeTransform
+[pydoc_SqlTransform]: 
https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.sql.html#apache_beam.transforms.sql.SqlTransform
+[pydoc_to_dataframe]: 
https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_dataframe
+[pydoc_to_pcollection]: 
https://beam.apache.org/releases/pydoc/current/apache_beam.dataframe.convert.html#apache_beam.dataframe.convert.to_pcollection
 
 {{< button-colab 
url="https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/tour-of-beam/dataframes.ipynb";
 >}}
diff --git a/website/www/site/layouts/partials/section-menu/en/sdks.html 
b/website/www/site/layouts/partials/section-menu/en/sdks.html
index dbcc3c2..d46e05d 100644
--- a/website/www/site/layouts/partials/section-menu/en/sdks.html
+++ b/website/www/site/layouts/partials/section-menu/en/sdks.html
@@ -112,7 +112,10 @@
   <span class="section-nav-list-title">DataFrames</span>
   <ul class="section-nav-list">
     <li><a href="/documentation/dsls/dataframes/overview/">Overview</a></li>
-    <li><a 
href="/documentation/dsls/dataframes/differences-from-pandas/">Differences from 
Pandas</a></li>
+    <li><a 
href="/documentation/dsls/dataframes/differences-from-pandas/">Differences from 
pandas</a></li>
+    <li><a 
href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/dataframe";
 target="_blank">
+      Example pipelines <img src="/images/external-link-icon.png" width="14" 
height="14" alt="External link."></a>
+    </li>
     <li><a 
href="https://beam.apache.org/releases/pydoc/{{.Site.Params.release_latest 
}}/apache_beam.dataframe.html" target="_blank">
       DataFrame API reference <img src="/images/external-link-icon.png" 
width="14" height="14" alt="External link."></a>
     </li>

[beam] branch master updated: Minor: Add more links to DataFrame API documentation (#15661)

Reply via email to