[jira] [Commented] (ARROW-2357) Benchmark PandasObjectIsNull

ASF GitHub Bot (JIRA) Mon, 02 Apr 2018 21:38:06 -0700

    [ 
https://issues.apache.org/jira/browse/ARROW-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423459#comment-16423459
 ]


ASF GitHub Bot commented on ARROW-2357:
---------------------------------------

xhochy closed pull request #1798: ARROW-2357: [Python] Add microbenchmark for 
PandasObjectIsNull()
URL: https://github.com/apache/arrow/pull/1798
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/python/CMakeLists.txt 
b/cpp/src/arrow/python/CMakeLists.txt
index f931abe38..b985df914 100644
--- a/cpp/src/arrow/python/CMakeLists.txt
+++ b/cpp/src/arrow/python/CMakeLists.txt
@@ -50,6 +50,7 @@ set(ARROW_PYTHON_TEST_LINK_LIBS ${ARROW_PYTHON_MIN_TEST_LIBS})
 set(ARROW_PYTHON_SRCS
   arrow_to_pandas.cc
   arrow_to_python.cc
+  benchmark.cc
   builtin_convert.cc
   common.cc
   config.cc
@@ -99,6 +100,7 @@ install(FILES
   api.h
   arrow_to_pandas.h
   arrow_to_python.h
+  benchmark.h
   builtin_convert.h
   common.h
   config.h
diff --git a/cpp/src/arrow/python/benchmark.cc 
b/cpp/src/arrow/python/benchmark.cc
new file mode 100644
index 000000000..2d29f69d2
--- /dev/null
+++ b/cpp/src/arrow/python/benchmark.cc
@@ -0,0 +1,38 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <arrow/python/benchmark.h>
+#include <arrow/python/helpers.h>
+
+namespace arrow {
+namespace py {
+namespace benchmark {
+
+void Benchmark_PandasObjectIsNull(PyObject* list) {
+  if (!PyList_CheckExact(list)) {
+    PyErr_SetString(PyExc_TypeError, "expected a list");
+    return;
+  }
+  Py_ssize_t i, n = PyList_GET_SIZE(list);
+  for (i = 0; i < n; i++) {
+    internal::PandasObjectIsNull(PyList_GET_ITEM(list, i));
+  }
+}
+
+}  // namespace benchmark
+}  // namespace py
+}  // namespace arrow
diff --git a/cpp/src/arrow/python/benchmark.h b/cpp/src/arrow/python/benchmark.h
new file mode 100644
index 000000000..f88b6b432
--- /dev/null
+++ b/cpp/src/arrow/python/benchmark.h
@@ -0,0 +1,39 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef ARROW_PYTHON_BENCHMARK_H
+#define ARROW_PYTHON_BENCHMARK_H
+
+#include "arrow/python/platform.h"
+
+#include "arrow/util/visibility.h"
+
+namespace arrow {
+namespace py {
+namespace benchmark {
+
+// Micro-benchmark routines for use from ASV
+
+// Run PandasObjectIsNull() once over every object in *list*
+ARROW_EXPORT
+void Benchmark_PandasObjectIsNull(PyObject* list);
+
+}  // namespace benchmark
+}  // namespace py
+}  // namespace arrow
+
+#endif  // ARROW_PYTHON_BENCHMARK_H
diff --git a/python/benchmarks/common.py b/python/benchmarks/common.py
index b205ba581..70cd92492 100644
--- a/python/benchmarks/common.py
+++ b/python/benchmarks/common.py
@@ -16,16 +16,23 @@
 # under the License.
 
 import codecs
+import decimal
+from functools import partial
+import itertools
 import os
 import sys
 import unicodedata
 
 import numpy as np
 
+import pyarrow as pa
+
 
 KILOBYTE = 1 << 10
 MEGABYTE = KILOBYTE * KILOBYTE
 
+DEFAULT_NONE_PROB = 0.3
+
 
 def _multiplicate_sequence(base, target_size):
     q, r = divmod(target_size, len(base))
@@ -97,3 +104,248 @@ def get_random_unicode(n, *, seed=42):
     result = ''.join(unicode_arr.tolist())
     assert len(result) == n, (len(result), len(unicode_arr))
     return result
+
+
+class BuiltinsGenerator(object):
+
+    def __init__(self, seed=42):
+        self.rnd = np.random.RandomState(seed)
+
+    def sprinkle(self, lst, prob, value):
+        """
+        Sprinkle *value* entries in list *lst* with likelihood *prob*.
+        """
+        for i, p in enumerate(self.rnd.random_sample(size=len(lst))):
+            if p < prob:
+                lst[i] = value
+
+    def sprinkle_nones(self, lst, prob):
+        """
+        Sprinkle None entries in list *lst* with likelihood *prob*.
+        """
+        self.sprinkle(lst, prob, None)
+
+    def generate_int_list(self, n, none_prob=DEFAULT_NONE_PROB):
+        """
+        Generate a list of Python ints with *none_prob* probability of
+        an entry being None.
+        """
+        data = list(range(n))
+        self.sprinkle_nones(data, none_prob)
+        return data
+
+    def generate_float_list(self, n, none_prob=DEFAULT_NONE_PROB,
+                            use_nan=False):
+        """
+        Generate a list of Python floats with *none_prob* probability of
+        an entry being None (or NaN if *use_nan* is true).
+        """
+        # Make sure we get Python floats, not np.float64
+        data = list(map(float, self.rnd.uniform(0.0, 1.0, n)))
+        assert len(data) == n
+        self.sprinkle(data, none_prob, value=float('nan') if use_nan else None)
+        return data
+
+    def generate_bool_list(self, n, none_prob=DEFAULT_NONE_PROB):
+        """
+        Generate a list of Python bools with *none_prob* probability of
+        an entry being None.
+        """
+        # Make sure we get Python bools, not np.bool_
+        data = [bool(x >= 0.5) for x in self.rnd.uniform(0.0, 1.0, n)]
+        assert len(data) == n
+        self.sprinkle_nones(data, none_prob)
+        return data
+
+    def generate_decimal_list(self, n, none_prob=DEFAULT_NONE_PROB,
+                              use_nan=False):
+        """
+        Generate a list of Python Decimals with *none_prob* probability of
+        an entry being None (or NaN if *use_nan* is true).
+        """
+        data = [decimal.Decimal('%.9f' % f)
+                for f in self.rnd.uniform(0.0, 1.0, n)]
+        assert len(data) == n
+        self.sprinkle(data, none_prob,
+                      value=decimal.Decimal('nan') if use_nan else None)
+        return data
+
+    def generate_object_list(self, n, none_prob=DEFAULT_NONE_PROB):
+        """
+        Generate a list of generic Python objects with *none_prob*
+        probability of an entry being None.
+        """
+        data = [object() for i in range(n)]
+        self.sprinkle_nones(data, none_prob)
+        return data
+
+    def _generate_varying_sequences(self, random_factory, n, min_size, 
max_size, none_prob):
+        """
+        Generate a list of *n* sequences of varying size between *min_size*
+        and *max_size*, with *none_prob* probability of an entry being None.
+        The base material for each sequence is obtained by calling
+        `random_factory(<some size>)`
+        """
+        base_size = 10000
+        base = random_factory(base_size + max_size)
+        data = []
+        for i in range(n):
+            off = self.rnd.randint(base_size)
+            if min_size == max_size:
+                size = min_size
+            else:
+                size = self.rnd.randint(min_size, max_size + 1)
+            data.append(base[off:off + size])
+        self.sprinkle_nones(data, none_prob)
+        assert len(data) == n
+        return data
+
+    def generate_fixed_binary_list(self, n, size, none_prob=DEFAULT_NONE_PROB):
+        """
+        Generate a list of bytestrings with a fixed *size*.
+        """
+        return self._generate_varying_sequences(get_random_bytes, n,
+                                                size, size, none_prob)
+
+
+    def generate_varying_binary_list(self, n, min_size, max_size,
+                                     none_prob=DEFAULT_NONE_PROB):
+        """
+        Generate a list of bytestrings with a random size between
+        *min_size* and *max_size*.
+        """
+        return self._generate_varying_sequences(get_random_bytes, n,
+                                                min_size, max_size, none_prob)
+
+
+    def generate_ascii_string_list(self, n, min_size, max_size,
+                                   none_prob=DEFAULT_NONE_PROB):
+        """
+        Generate a list of ASCII strings with a random size between
+        *min_size* and *max_size*.
+        """
+        return self._generate_varying_sequences(get_random_ascii, n,
+                                                min_size, max_size, none_prob)
+
+
+    def generate_unicode_string_list(self, n, min_size, max_size,
+                                     none_prob=DEFAULT_NONE_PROB):
+        """
+        Generate a list of unicode strings with a random size between
+        *min_size* and *max_size*.
+        """
+        return self._generate_varying_sequences(get_random_unicode, n,
+                                                min_size, max_size, none_prob)
+
+
+    def generate_int_list_list(self, n, min_size, max_size,
+                               none_prob=DEFAULT_NONE_PROB):
+        """
+        Generate a list of lists of Python ints with a random size between
+        *min_size* and *max_size*.
+        """
+        return self._generate_varying_sequences(
+            partial(self.generate_int_list, none_prob=none_prob),
+            n, min_size, max_size, none_prob)
+
+    def generate_tuple_list(self, n, none_prob=DEFAULT_NONE_PROB):
+        """
+        Generate a list of tuples with random values.
+        Each tuple has the form `(int value, float value, bool value)`
+        """
+        dicts = self.generate_dict_list(n, none_prob=none_prob)
+        tuples = [(d.get('u'), d.get('v'), d.get('w'))
+                  if d is not None else None
+                  for d in dicts]
+        assert len(tuples) == n
+        return tuples
+
+    def generate_dict_list(self, n, none_prob=DEFAULT_NONE_PROB):
+        """
+        Generate a list of dicts with random values.
+        Each dict has the form `{'u': int value, 'v': float value, 'w': bool 
value}`
+        """
+        ints = self.generate_int_list(n, none_prob=none_prob)
+        floats = self.generate_float_list(n, none_prob=none_prob)
+        bools = self.generate_bool_list(n, none_prob=none_prob)
+        dicts = []
+        # Keep half the Nones, omit the other half
+        keep_nones = itertools.cycle([True, False])
+        for u, v, w in zip(ints, floats, bools):
+            d = {}
+            if u is not None or next(keep_nones):
+                d['u'] = u
+            if v is not None or next(keep_nones):
+                d['v'] = v
+            if w is not None or next(keep_nones):
+                d['w'] = w
+            dicts.append(d)
+        self.sprinkle_nones(dicts, none_prob)
+        assert len(dicts) == n
+        return dicts
+
+    def get_type_and_builtins(self, n, type_name):
+        """
+        Return a `(arrow type, list)` tuple where the arrow type
+        corresponds to the given logical *type_name*, and the list
+        is a list of *n* random-generated Python objects compatible
+        with the arrow type.
+        """
+        size = None
+
+        if type_name in ('bool', 'decimal', 'ascii', 'unicode', 'int64 list'):
+            kind = type_name
+        elif type_name.startswith(('int', 'uint')):
+            kind = 'int'
+        elif type_name.startswith('float'):
+            kind = 'float'
+        elif type_name.startswith('struct'):
+            kind = 'struct'
+        elif type_name == 'binary':
+            kind = 'varying binary'
+        elif type_name.startswith('binary'):
+            kind = 'fixed binary'
+            size = int(type_name[6:])
+            assert size > 0
+        else:
+            raise ValueError("unrecognized type %r" % (type_name,))
+
+        if kind in ('int', 'float'):
+            ty = getattr(pa, type_name)()
+        elif kind == 'bool':
+            ty = pa.bool_()
+        elif kind == 'decimal':
+            ty = pa.decimal128(9, 9)
+        elif kind == 'fixed binary':
+            ty = pa.binary(size)
+        elif kind == 'varying binary':
+            ty = pa.binary()
+        elif kind in ('ascii', 'unicode'):
+            ty = pa.string()
+        elif kind == 'int64 list':
+            ty = pa.list_(pa.int64())
+        elif kind == 'struct':
+            ty = pa.struct([pa.field('u', pa.int64()),
+                            pa.field('v', pa.float64()),
+                            pa.field('w', pa.bool_())])
+
+        factories = {
+            'int': self.generate_int_list,
+            'float': self.generate_float_list,
+            'bool': self.generate_bool_list,
+            'decimal': self.generate_decimal_list,
+            'fixed binary': partial(self.generate_fixed_binary_list,
+                                    size=size),
+            'varying binary': partial(self.generate_varying_binary_list,
+                                      min_size=3, max_size=40),
+            'ascii': partial(self.generate_ascii_string_list,
+                             min_size=3, max_size=40),
+            'unicode': partial(self.generate_unicode_string_list,
+                               min_size=3, max_size=40),
+            'int64 list': partial(self.generate_int_list_list,
+                                  min_size=0, max_size=20),
+            'struct': self.generate_dict_list,
+            'struct from tuples': self.generate_tuple_list,
+        }
+        data = factories[kind](n)
+        return ty, data
diff --git a/python/benchmarks/convert_builtins.py 
b/python/benchmarks/convert_builtins.py
index a4dc9f262..91b15ecf5 100644
--- a/python/benchmarks/convert_builtins.py
+++ b/python/benchmarks/convert_builtins.py
@@ -15,233 +15,13 @@
 # specific language governing permissions and limitations
 # under the License.
 
-from functools import partial
-import itertools
-
-import numpy as np
 import pyarrow as pa
 
 from . import common
 
 
-DEFAULT_NONE_PROB = 0.3
-
-
 # TODO:
 # - test dates and times
-# - test decimals
-
-class BuiltinsGenerator(object):
-
-    def __init__(self, seed=42):
-        self.rnd = np.random.RandomState(seed)
-
-    def sprinkle_nones(self, lst, prob):
-        """
-        Sprinkle None entries in list *lst* with likelihood *prob*.
-        """
-        for i, p in enumerate(self.rnd.random_sample(size=len(lst))):
-            if p < prob:
-                lst[i] = None
-
-    def generate_int_list(self, n, none_prob=DEFAULT_NONE_PROB):
-        """
-        Generate a list of Python ints with *none_prob* probability of
-        an entry being None.
-        """
-        data = list(range(n))
-        self.sprinkle_nones(data, none_prob)
-        return data
-
-    def generate_float_list(self, n, none_prob=DEFAULT_NONE_PROB):
-        """
-        Generate a list of Python floats with *none_prob* probability of
-        an entry being None.
-        """
-        # Make sure we get Python floats, not np.float64
-        data = list(map(float, self.rnd.uniform(0.0, 1.0, n)))
-        assert len(data) == n
-        self.sprinkle_nones(data, none_prob)
-        return data
-
-    def generate_bool_list(self, n, none_prob=DEFAULT_NONE_PROB):
-        """
-        Generate a list of Python bools with *none_prob* probability of
-        an entry being None.
-        """
-        # Make sure we get Python bools, not np.bool_
-        data = [bool(x >= 0.5) for x in self.rnd.uniform(0.0, 1.0, n)]
-        assert len(data) == n
-        self.sprinkle_nones(data, none_prob)
-        return data
-
-    def _generate_varying_sequences(self, random_factory, n, min_size, 
max_size, none_prob):
-        """
-        Generate a list of *n* sequences of varying size between *min_size*
-        and *max_size*, with *none_prob* probability of an entry being None.
-        The base material for each sequence is obtained by calling
-        `random_factory(<some size>)`
-        """
-        base_size = 10000
-        base = random_factory(base_size + max_size)
-        data = []
-        for i in range(n):
-            off = self.rnd.randint(base_size)
-            if min_size == max_size:
-                size = min_size
-            else:
-                size = self.rnd.randint(min_size, max_size + 1)
-            data.append(base[off:off + size])
-        self.sprinkle_nones(data, none_prob)
-        assert len(data) == n
-        return data
-
-    def generate_fixed_binary_list(self, n, size, none_prob=DEFAULT_NONE_PROB):
-        """
-        Generate a list of bytestrings with a fixed *size*.
-        """
-        return self._generate_varying_sequences(common.get_random_bytes, n,
-                                                size, size, none_prob)
-
-
-    def generate_varying_binary_list(self, n, min_size, max_size,
-                                     none_prob=DEFAULT_NONE_PROB):
-        """
-        Generate a list of bytestrings with a random size between
-        *min_size* and *max_size*.
-        """
-        return self._generate_varying_sequences(common.get_random_bytes, n,
-                                                min_size, max_size, none_prob)
-
-
-    def generate_ascii_string_list(self, n, min_size, max_size,
-                                   none_prob=DEFAULT_NONE_PROB):
-        """
-        Generate a list of ASCII strings with a random size between
-        *min_size* and *max_size*.
-        """
-        return self._generate_varying_sequences(common.get_random_ascii, n,
-                                                min_size, max_size, none_prob)
-
-
-    def generate_unicode_string_list(self, n, min_size, max_size,
-                                     none_prob=DEFAULT_NONE_PROB):
-        """
-        Generate a list of unicode strings with a random size between
-        *min_size* and *max_size*.
-        """
-        return self._generate_varying_sequences(common.get_random_unicode, n,
-                                                min_size, max_size, none_prob)
-
-
-    def generate_int_list_list(self, n, min_size, max_size,
-                               none_prob=DEFAULT_NONE_PROB):
-        """
-        Generate a list of lists of Python ints with a random size between
-        *min_size* and *max_size*.
-        """
-        return self._generate_varying_sequences(
-            partial(self.generate_int_list, none_prob=none_prob),
-            n, min_size, max_size, none_prob)
-
-    def generate_tuple_list(self, n, none_prob=DEFAULT_NONE_PROB):
-        """
-        Generate a list of tuples with random values.
-        Each tuple has the form `(int value, float value, bool value)`
-        """
-        dicts = self.generate_dict_list(n, none_prob=none_prob)
-        tuples = [(d.get('u'), d.get('v'), d.get('w'))
-                  if d is not None else None
-                  for d in dicts]
-        assert len(tuples) == n
-        return tuples
-
-    def generate_dict_list(self, n, none_prob=DEFAULT_NONE_PROB):
-        """
-        Generate a list of dicts with random values.
-        Each dict has the form `{'u': int value, 'v': float value, 'w': bool 
value}`
-        """
-        ints = self.generate_int_list(n, none_prob=none_prob)
-        floats = self.generate_float_list(n, none_prob=none_prob)
-        bools = self.generate_bool_list(n, none_prob=none_prob)
-        dicts = []
-        # Keep half the Nones, omit the other half
-        keep_nones = itertools.cycle([True, False])
-        for u, v, w in zip(ints, floats, bools):
-            d = {}
-            if u is not None or next(keep_nones):
-                d['u'] = u
-            if v is not None or next(keep_nones):
-                d['v'] = v
-            if w is not None or next(keep_nones):
-                d['w'] = w
-            dicts.append(d)
-        self.sprinkle_nones(dicts, none_prob)
-        assert len(dicts) == n
-        return dicts
-
-    def get_type_and_builtins(self, n, type_name):
-        """
-        Return a `(arrow type, list)` tuple where the arrow type
-        corresponds to the given logical *type_name*, and the list
-        is a list of *n* random-generated Python objects compatible
-        with the arrow type.
-        """
-        size = None
-
-        if type_name in ('bool', 'ascii', 'unicode', 'int64 list'):
-            kind = type_name
-        elif type_name.startswith(('int', 'uint')):
-            kind = 'int'
-        elif type_name.startswith('float'):
-            kind = 'float'
-        elif type_name.startswith('struct'):
-            kind = 'struct'
-        elif type_name == 'binary':
-            kind = 'varying binary'
-        elif type_name.startswith('binary'):
-            kind = 'fixed binary'
-            size = int(type_name[6:])
-            assert size > 0
-        else:
-            raise ValueError("unrecognized type %r" % (type_name,))
-
-        if kind in ('int', 'float'):
-            ty = getattr(pa, type_name)()
-        elif kind == 'bool':
-            ty = pa.bool_()
-        elif kind == 'fixed binary':
-            ty = pa.binary(size)
-        elif kind == 'varying binary':
-            ty = pa.binary()
-        elif kind in ('ascii', 'unicode'):
-            ty = pa.string()
-        elif kind == 'int64 list':
-            ty = pa.list_(pa.int64())
-        elif kind == 'struct':
-            ty = pa.struct([pa.field('u', pa.int64()),
-                            pa.field('v', pa.float64()),
-                            pa.field('w', pa.bool_())])
-
-        factories = {
-            'int': self.generate_int_list,
-            'float': self.generate_float_list,
-            'bool': self.generate_bool_list,
-            'fixed binary': partial(self.generate_fixed_binary_list,
-                                    size=size),
-            'varying binary': partial(self.generate_varying_binary_list,
-                                      min_size=3, max_size=40),
-            'ascii': partial(self.generate_ascii_string_list,
-                             min_size=3, max_size=40),
-            'unicode': partial(self.generate_unicode_string_list,
-                               min_size=3, max_size=40),
-            'int64 list': partial(self.generate_int_list_list,
-                                  min_size=0, max_size=20),
-            'struct': self.generate_dict_list,
-            'struct from tuples': self.generate_tuple_list,
-        }
-        data = factories[kind](n)
-        return ty, data
 
 
 class ConvertPyListToArray(object):
@@ -250,7 +30,7 @@ class ConvertPyListToArray(object):
     """
     size = 10 ** 5
     types = ('int32', 'uint32', 'int64', 'uint64',
-             'float32', 'float64', 'bool',
+             'float32', 'float64', 'bool', 'decimal',
              'binary', 'binary10', 'ascii', 'unicode',
              'int64 list', 'struct', 'struct from tuples')
 
@@ -258,7 +38,7 @@ class ConvertPyListToArray(object):
     params = [types]
 
     def setup(self, type_name):
-        gen = BuiltinsGenerator()
+        gen = common.BuiltinsGenerator()
         self.ty, self.data = gen.get_type_and_builtins(self.size, type_name)
 
     def time_convert(self, *args):
@@ -270,15 +50,15 @@ class InferPyListToArray(object):
     Benchmark pa.array(list of values) with type inference
     """
     size = 10 ** 5
-    types = ('int64', 'float64', 'bool', 'binary', 'ascii', 'unicode',
-             'int64 list')
+    types = ('int64', 'float64', 'bool', 'decimal', 'binary', 'ascii',
+             'unicode', 'int64 list')
     # TODO add 'struct' when supported
 
     param_names = ['type']
     params = [types]
 
     def setup(self, type_name):
-        gen = BuiltinsGenerator()
+        gen = common.BuiltinsGenerator()
         self.ty, self.data = gen.get_type_and_builtins(self.size, type_name)
 
     def time_infer(self, *args):
@@ -292,7 +72,7 @@ class ConvertArrayToPyList(object):
     """
     size = 10 ** 5
     types = ('int32', 'uint32', 'int64', 'uint64',
-             'float32', 'float64', 'bool',
+             'float32', 'float64', 'bool', 'decimal',
              'binary', 'binary10', 'ascii', 'unicode',
              'int64 list', 'struct')
 
@@ -300,7 +80,7 @@ class ConvertArrayToPyList(object):
     params = [types]
 
     def setup(self, type_name):
-        gen = BuiltinsGenerator()
+        gen = common.BuiltinsGenerator()
         self.ty, self.data = gen.get_type_and_builtins(self.size, type_name)
         self.arr = pa.array(self.data, type=self.ty)
 
diff --git a/python/benchmarks/microbenchmarks.py 
b/python/benchmarks/microbenchmarks.py
new file mode 100644
index 000000000..bae5806e1
--- /dev/null
+++ b/python/benchmarks/microbenchmarks.py
@@ -0,0 +1,47 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pyarrow as pa
+import pyarrow.benchmark as pb
+
+from . import common
+
+
+class PandasObjectIsNull(object):
+    size = 10 ** 5
+    types = ('int', 'float', 'object', 'decimal')
+
+    param_names = ['type']
+    params = [types]
+
+    def setup(self, type_name):
+        gen = common.BuiltinsGenerator()
+        if type_name == 'int':
+            lst = gen.generate_int_list(self.size)
+        elif type_name == 'float':
+            lst = gen.generate_float_list(self.size, use_nan=True)
+        elif type_name == 'object':
+            lst = gen.generate_object_list(self.size)
+        elif type_name == 'decimal':
+            lst = gen.generate_decimal_list(self.size)
+        else:
+            assert 0
+        self.lst = lst
+
+    def time_PandasObjectIsNull(self, *args):
+        pb.benchmark_PandasObjectIsNull(self.lst)
+
diff --git a/python/pyarrow/benchmark.pxi b/python/pyarrow/benchmark.pxi
new file mode 100644
index 000000000..ab251017d
--- /dev/null
+++ b/python/pyarrow/benchmark.pxi
@@ -0,0 +1,20 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+
+def benchmark_PandasObjectIsNull(list obj):
+    Benchmark_PandasObjectIsNull(obj)
diff --git a/python/pyarrow/benchmark.py b/python/pyarrow/benchmark.py
new file mode 100644
index 000000000..ef1ef538d
--- /dev/null
+++ b/python/pyarrow/benchmark.py
@@ -0,0 +1,20 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# flake8: noqa
+
+from pyarrow.lib import benchmark_PandasObjectIsNull
diff --git a/python/pyarrow/includes/libarrow.pxd 
b/python/pyarrow/includes/libarrow.pxd
index dbcc94c6f..8654c9c63 100644
--- a/python/pyarrow/includes/libarrow.pxd
+++ b/python/pyarrow/includes/libarrow.pxd
@@ -978,6 +978,10 @@ cdef extern from 'arrow/python/config.h' namespace 
'arrow::py':
     void set_numpy_nan(object o)
 
 
+cdef extern from 'arrow/python/benchmark.h' namespace 'arrow::py::benchmark':
+    void Benchmark_PandasObjectIsNull(object lst) except *
+
+
 cdef extern from 'arrow/util/compression.h' namespace 'arrow' nogil:
     enum CompressionType" arrow::Compression::type":
         CompressionType_UNCOMPRESSED" arrow::Compression::UNCOMPRESSED"
diff --git a/python/pyarrow/lib.pyx b/python/pyarrow/lib.pyx
index b4ca49caf..672be08df 100644
--- a/python/pyarrow/lib.pyx
+++ b/python/pyarrow/lib.pyx
@@ -126,5 +126,8 @@ include "feather.pxi"
 # Python serialization
 include "serialization.pxi"
 
+# Micro-benchmark routines
+include "benchmark.pxi"
+
 # Public API
 include "public-api.pxi"


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Benchmark PandasObjectIsNull
> ----------------------------
>
>                 Key: ARROW-2357
>                 URL: https://issues.apache.org/jira/browse/ARROW-2357
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 0.9.0
>            Reporter: Phillip Cloud
>            Assignee: Antoine Pitrou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> This is a follow-up to ARROW-2354 ([C++] Make PyDecimal_Check() faster). We 
> should benchmark {{PandasObjectIsNull}} as it gets called in many of our 
> conversion routines in tight loops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2357) Benchmark PandasObjectIsNull

Reply via email to