[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=293385&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-293385 ]
ASF GitHub Bot logged work on BEAM-7389: ---------------------------------------- Author: ASF GitHub Bot Created on: 12/Aug/19 21:09 Start Date: 12/Aug/19 21:09 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #9262: [BEAM-7389] Add code examples for Regex page URL: https://github.com/apache/beam/pull/9262#discussion_r313128647 ########## File path: website/src/documentation/transforms/python/element-wise/regex.md ########## @@ -19,10 +19,151 @@ limitations under the License. --> # Regex + +<script type="text/javascript"> +localStorage.setItem('language', 'language-py') +</script> + Filters input string elements based on a regex. May also transform them based on the matching groups. ## Examples -See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. -## Related transforms -* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map) applies a simple 1-to-1 mapping function over each element in the collection \ No newline at end of file +In the following examples, we create a pipeline with a `PCollection` of text strings. +Then, we use the `re` module to search, replace, and split through the text elements using +[regular expressions](https://docs.python.org/3/library/re.html). + +You can use tools to help you create and test your regular expressions such as +[regex101](https://regex101.com/), +make sure to specify the Python flavor at the left side bar. + +### Example 1: Regex match + +[`re.match`](https://docs.python.org/3/library/re.html#re.match) +will try to match the regular expression from the beginning of the string. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py tag:regex_match %}``` + +Output `PCollection` after regex: + +``` +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py tag:plant_matches %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +### Example 2: Regex search + +[`re.search`](https://docs.python.org/3/library/re.html#re.search) +will try to search for the first occurrence the regular expression in the string. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py tag:regex_search %}``` + +Output `PCollection` after regex: + +``` +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py tag:plant_matches %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +### Example 3: Regex find all + +[`re.finditer`](https://docs.python.org/3/library/re.html#re.finditer) +will try to search for all the occurrence the regular expression in the string. +This returns an iterator of match objects. + +```py +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py tag:regex_find_all %}``` + +Output `PCollection` after regex: + +``` +{% github_sample /apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py tag:words %}``` + +<table> + <td> + <a class="button" target="_blank" + href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py"> + <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" + width="20px" height="20px" alt="View on GitHub" /> + View on GitHub + </a> + </td> +</table> +<br> + +### Example 4: Regex replace + +[`re.sub`](https://docs.python.org/3/library/re.html#re.sub) +will substitute occurrences the regular expression in the string. Review comment: Done ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 293385) Time Spent: 41h 20m (was: 41h 10m) > Colab examples for element-wise transforms (Python) > --------------------------------------------------- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website > Reporter: Rose Nguyen > Assignee: David Cavazos > Priority: Minor > Time Spent: 41h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016)