[ 
https://issues.apache.org/jira/browse/BEAM-11480?focusedWorklogId=665752&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-665752
 ]

ASF GitHub Bot logged work on BEAM-11480:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Oct/21 00:30
            Start Date: 17/Oct/21 00:30
    Worklog Time Spent: 10m 
      Work Description: TheNeuralBit commented on a change in pull request 
#15600:
URL: https://github.com/apache/beam/pull/15600#discussion_r729968203



##########
File path: website/www/site/content/en/documentation/dsls/dataframes/overview.md
##########
@@ -55,32 +50,8 @@ To use the DataFrames API in a larger pipeline, you can 
convert a PCollection to
 
 Here’s an example that creates a schema-aware PCollection, converts it to a 
DataFrame using `to_dataframe`, processes the DataFrame, and then converts the 
DataFrame back to a PCollection using `to_pcollection`:
 
-<!-- TODO(BEAM-11480): Convert these examples to snippets -->
 {{< highlight py >}}
-from apache_beam.dataframe.convert import to_dataframe
-from apache_beam.dataframe.convert import to_pcollection
-...
-    # Read the text file[pattern] into a PCollection.
-    lines = p | 'Read' >> ReadFromText(known_args.input)
-
-    words = (
-        lines
-        | 'Split' >> beam.FlatMap(
-            lambda line: re.findall(r'[\w]+', line)).with_output_types(str)
-        # Map to Row objects to generate a schema suitable for conversion
-        # to a dataframe.
-        | 'ToRows' >> beam.Map(lambda word: beam.Row(word=word)))
-
-    df = to_dataframe(words)
-    df['count'] = 1
-    counted = df.groupby('word').sum()
-    counted.to_csv(known_args.output)
-
-    # Deferred DataFrames can also be converted back to schema'd PCollections
-    counted_pc = to_pcollection(counted, include_indexes=True)
-
-    # Do something with counted_pc
-    ...
+{{< code_sample "sdks/python/apache_beam/examples/dataframe/wordcount.py" 
DataFrame_wordcount >}}

Review comment:
       Yes that's right, I guess I can just hardcode the imports here to keep 
them in. Thanks for pointing this out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 665752)
    Time Spent: 2.5h  (was: 2h 20m)

> Create snippets for DataFrame examples and link them to docs
> ------------------------------------------------------------
>
>                 Key: BEAM-11480
>                 URL: https://issues.apache.org/jira/browse/BEAM-11480
>             Project: Beam
>          Issue Type: Task
>          Components: dsl-dataframe, sdk-py-core
>            Reporter: Brian Hulette
>            Assignee: Brian Hulette
>            Priority: P3
>              Labels: dataframe-api
>             Fix For: Not applicable
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The examples at 
> https://beam.apache.org/documentation/dsls/dataframes/overview/ should 
> reference snippets that are tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to