This is an automated email from the ASF dual-hosted git repository.
thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git
The following commit(s) were added to refs/heads/main by this push:
new 82b37fb ARROW-13751: Recipe for searching for values matching a
predicate (#79)
82b37fb is described below
commit 82b37fbe91e2c098d460ebec3e02faac7d0b7c42
Author: Alessandro Molina <[email protected]>
AuthorDate: Tue Oct 5 11:31:32 2021 +0200
ARROW-13751: Recipe for searching for values matching a predicate (#79)
* Recipe for searching for values matching a predicate
* Apply suggestions from code review
Co-authored-by: Weston Pace <[email protected]>
* Wrong heading
Co-authored-by: Weston Pace <[email protected]>
---
python/source/data.rst | 56 +++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 55 insertions(+), 1 deletion(-)
diff --git a/python/source/data.rst b/python/source/data.rst
index bdcf648..1d86b56 100644
--- a/python/source/data.rst
+++ b/python/source/data.rst
@@ -180,4 +180,58 @@ We can combine them into a single table using
:func:`pyarrow.concat_tables`:
the result will be a table with multiple chunks, each pointing to the
original
data that has been appended. Under some conditions, Arrow might have to
cast data from one type to another (if `promote=True`). In such cases the
data
- will need to be copied and an extra cost will occur.
\ No newline at end of file
+ will need to be copied and an extra cost will occur.
+
+Searching for values matching a predicate in Arrays
+===================================================
+
+If you have to look for values matching a predicate in Arrow arrays
+the :mod:`arrow.compute` module provides several methods that
+can be used to find the values you are looking for.
+
+For example, given an array with numbers from 0 to 9, if we
+want to look only for those greater than 5 we could use the
+func:`arrow.compute.greater` method and get back the elements
+that fit our predicate
+
+.. testcode::
+
+ import pyarrow as pa
+ import pyarrow.compute as pc
+
+ arr = pa.array(range(10))
+ gtfive = pc.greater(arr, 5)
+
+ print(gtfive.to_string())
+
+.. testoutput::
+
+ [
+ false,
+ false,
+ false,
+ false,
+ false,
+ false,
+ true,
+ true,
+ true,
+ true
+ ]
+
+Furthermore we can filter the array to get only the entries
+that match our predicate
+
+.. testcode::
+
+ filtered_array = pc.filter(arr, gtfive)
+ print(filtered_array)
+
+.. testoutput::
+
+ [
+ 6,
+ 7,
+ 8,
+ 9
+ ]
\ No newline at end of file