TheNeuralBit commented on a change in pull request #15600:
URL: https://github.com/apache/beam/pull/15600#discussion_r729968203
##########
File path: website/www/site/content/en/documentation/dsls/dataframes/overview.md
##########
@@ -55,32 +50,8 @@ To use the DataFrames API in a larger pipeline, you can
convert a PCollection to
Here’s an example that creates a schema-aware PCollection, converts it to a
DataFrame using `to_dataframe`, processes the DataFrame, and then converts the
DataFrame back to a PCollection using `to_pcollection`:
-<!-- TODO(BEAM-11480): Convert these examples to snippets -->
{{< highlight py >}}
-from apache_beam.dataframe.convert import to_dataframe
-from apache_beam.dataframe.convert import to_pcollection
-...
- # Read the text file[pattern] into a PCollection.
- lines = p | 'Read' >> ReadFromText(known_args.input)
-
- words = (
- lines
- | 'Split' >> beam.FlatMap(
- lambda line: re.findall(r'[\w]+', line)).with_output_types(str)
- # Map to Row objects to generate a schema suitable for conversion
- # to a dataframe.
- | 'ToRows' >> beam.Map(lambda word: beam.Row(word=word)))
-
- df = to_dataframe(words)
- df['count'] = 1
- counted = df.groupby('word').sum()
- counted.to_csv(known_args.output)
-
- # Deferred DataFrames can also be converted back to schema'd PCollections
- counted_pc = to_pcollection(counted, include_indexes=True)
-
- # Do something with counted_pc
- ...
+{{< code_sample "sdks/python/apache_beam/examples/dataframe/wordcount.py"
DataFrame_wordcount >}}
Review comment:
Yes that's right, I guess I can just hardcode the imports here to keep
them in. Thanks for pointing this out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]