wjones127 commented on a change in pull request #11913:
URL: https://github.com/apache/arrow/pull/11913#discussion_r769866631
##########
File path: docs/source/developers/guide/tutorials/python_tutorial.rst
##########
@@ -25,3 +25,532 @@
***************
Python tutorial
***************
+
+In this tutorial we will make an actual feature contribution to
+Arrow following the steps specified by :ref:`quick-ref-guide`
+section of the guide and a more detailed :ref:`step_by_step`
+section. Navigate there whenever there is some information
+you may find is missing here.
+
+The feature contribution will be added to the compute module
+in PyArrow. But you can also follow the steps in case you are
+correcting a bug or adding a binding.
+
+This tutorial is different from the :ref:`step_by_step` as we
+will be working on a specific case. This tutorial is not meant
+as a step by step guide.
+
+**Lets start!**
+
+Set up
+------
+
+Lets setup the Arrow repository. We presume here that Git is
Review comment:
```suggestion
Let's setup the Arrow repository. We presume here that Git is
```
##########
File path: docs/source/developers/guide/tutorials/python_tutorial.rst
##########
@@ -25,3 +25,532 @@
***************
Python tutorial
***************
+
+In this tutorial we will make an actual feature contribution to
+Arrow following the steps specified by :ref:`quick-ref-guide`
+section of the guide and a more detailed :ref:`step_by_step`
+section. Navigate there whenever there is some information
+you may find is missing here.
+
+The feature contribution will be added to the compute module
+in PyArrow. But you can also follow the steps in case you are
+correcting a bug or adding a binding.
+
+This tutorial is different from the :ref:`step_by_step` as we
+will be working on a specific case. This tutorial is not meant
+as a step by step guide.
+
+**Lets start!**
+
+Set up
+------
+
+Lets setup the Arrow repository. We presume here that Git is
+already installed. Otherwise please see the :ref:`set-up` section.
+
+Once the `Apache Arrow repository <https://github.com/apache/arrow>`_
+is forked we will clone it and add the link of the main repository
+to our upstream.
+
+.. code:: console
+
+ $ git clone https://github.com/<your username>/arrow.git
+ $ cd arrow
+ $ git remote add upstream https://github.com/apache/arrow
+
+Building PyArrow
+----------------
+
+Script for building PyArrow differs depending on the Operating
+System you are using. For this reason we will only refer to
+the instructions for the building process in this tutorial.
+
+.. seealso::
+
+ For the **introduction** to the building process refer to the
+ :ref:`build-arrow-guide` section.
+
+ For the **instructions** on how to build PyArrow refer to the
+ :ref:`build_pyarrow` section.
+
+Create a JIRA issue for the new feature
+---------------------------------------
+
+We will add a new feature that imitates an existing function
+``min_max`` from the ``arrow.compute`` module but makes the
+interval bigger by 1 in both directions. Note that this is a
+made-up function for the purpose of this guide.
+
+See the example of the ``pc.min_max`` in
+`this link
<https://arrow.apache.org/cookbook/py/data.html#computing-mean-min-max-values-of-an-array>`_.
+
+First we need to create a JIRA issue as it doesn't exist yet.
+With a JIRA account created we will navigate to the
+`Apache Arrow JIRA dashboard <https://issues.apache.org/jira/projects/ARROW>`_
+and click on the **Create** button.
+
+.. figure:: python_tutorial_jira_title.jpeg
+ :scale: 70 %
+ :alt: JIRA dashboard with a window for creating a new issue.
+
+ Creating a JIRA issue, adding title (summary) and components.
+
+.. figure:: python_tutorial_jira_description.jpeg
+ :scale: 70 %
+ :alt: JIRA dashboard with a window for creating a
+ description for the new issue.
+
+ Creating a JIRA issue, adding a description.
+
+We will also add some comments to start a conversation.
+
+.. figure:: python_tutorial_jira_comment.jpeg
+ :scale: 50 %
+ :alt: JIRA issue page where comment is being added.
+
+ Adding a comment to the JIRA ticket we created.
+
+We have successfully created a new JIRA issue with index ARROW-14977.
+
+.. figure:: python_tutorial_jira_issue.jpeg
+ :scale: 50 %
+ :alt: JIRA page of the issue just created.
+
+ Our JIRA issue. Yay!
Review comment:
This is done in the screenshot, but I think worth calling out to the
reader: "Make sure you assign yourself to the issue to let others know you are
working on it."
##########
File path: docs/source/developers/guide/tutorials/python_tutorial.rst
##########
@@ -25,3 +25,513 @@
***************
Python tutorial
***************
+
+In this tutorial we will make an actual feature contribution to
+Arrow following the steps specified by :ref:`quick-ref-guide`
+section of the guide and a more detailed :ref:`step_by_step`
+section. Navigate there whenever there is some information
+you may find is missing here.
+
+The feature contribution will be added to the compute module
+in PyArrow. But you can also follow the steps in case you are
+correcting a bug or adding a binding.
+
+This tutorial is different from the :ref:`step_by_step` as we
+will be working on a specific case. This tutorial is not meant
+as a step by step guide.
Review comment:
```suggestion
will be working on a specific case. This tutorial is not meant
as a step-by-step guide.
```
##########
File path: docs/source/developers/guide/tutorials/python_tutorial.rst
##########
@@ -25,3 +25,532 @@
***************
Python tutorial
***************
+
+In this tutorial we will make an actual feature contribution to
+Arrow following the steps specified by :ref:`quick-ref-guide`
+section of the guide and a more detailed :ref:`step_by_step`
+section. Navigate there whenever there is some information
+you may find is missing here.
+
+The feature contribution will be added to the compute module
+in PyArrow. But you can also follow the steps in case you are
+correcting a bug or adding a binding.
+
+This tutorial is different from the :ref:`step_by_step` as we
+will be working on a specific case. This tutorial is not meant
+as a step by step guide.
+
+**Lets start!**
+
+Set up
+------
+
+Lets setup the Arrow repository. We presume here that Git is
+already installed. Otherwise please see the :ref:`set-up` section.
+
+Once the `Apache Arrow repository <https://github.com/apache/arrow>`_
+is forked we will clone it and add the link of the main repository
+to our upstream.
+
+.. code:: console
+
+ $ git clone https://github.com/<your username>/arrow.git
+ $ cd arrow
+ $ git remote add upstream https://github.com/apache/arrow
+
+Building PyArrow
+----------------
+
+Script for building PyArrow differs depending on the Operating
+System you are using. For this reason we will only refer to
+the instructions for the building process in this tutorial.
+
+.. seealso::
+
+ For the **introduction** to the building process refer to the
+ :ref:`build-arrow-guide` section.
+
+ For the **instructions** on how to build PyArrow refer to the
+ :ref:`build_pyarrow` section.
+
+Create a JIRA issue for the new feature
+---------------------------------------
+
+We will add a new feature that imitates an existing function
+``min_max`` from the ``arrow.compute`` module but makes the
+interval bigger by 1 in both directions. Note that this is a
+made-up function for the purpose of this guide.
+
+See the example of the ``pc.min_max`` in
+`this link
<https://arrow.apache.org/cookbook/py/data.html#computing-mean-min-max-values-of-an-array>`_.
+
+First we need to create a JIRA issue as it doesn't exist yet.
+With a JIRA account created we will navigate to the
+`Apache Arrow JIRA dashboard <https://issues.apache.org/jira/projects/ARROW>`_
+and click on the **Create** button.
+
+.. figure:: python_tutorial_jira_title.jpeg
+ :scale: 70 %
+ :alt: JIRA dashboard with a window for creating a new issue.
+
+ Creating a JIRA issue, adding title (summary) and components.
+
+.. figure:: python_tutorial_jira_description.jpeg
+ :scale: 70 %
+ :alt: JIRA dashboard with a window for creating a
+ description for the new issue.
+
+ Creating a JIRA issue, adding a description.
+
+We will also add some comments to start a conversation.
+
+.. figure:: python_tutorial_jira_comment.jpeg
+ :scale: 50 %
+ :alt: JIRA issue page where comment is being added.
+
+ Adding a comment to the JIRA ticket we created.
+
+We have successfully created a new JIRA issue with index ARROW-14977.
+
+.. figure:: python_tutorial_jira_issue.jpeg
+ :scale: 50 %
+ :alt: JIRA page of the issue just created.
+
+ Our JIRA issue. Yay!
+
+To see the issue in JIRA follow
+`this link <https://issues.apache.org/jira/browse/ARROW-14977>`_.
+
+.. seealso::
+
+ To get more information on JIRA issues go to
+ :ref:`finding-issues` part of the guide.
+
+Start the work on a new branch
+------------------------------
+
+Before we start working on adding the feature we should
+create a new branch from updated master.
+
+.. code:: console
+
+ $ git checkout master
+ $ git fetch upstream
+ $ git pull --ff-only upstream master
+ $ git checkout -b ARROW-14977
+
+Lets research the Arrow library to see where the ``pc.min_max``
+function is defined/connected with the C++ and get an idea
+where we could implement the new feature.
+
+.. figure:: python_tutorial_github_search.jpeg
+ :scale: 50 %
+ :alt: Apache Arrow GitHub repository dashboard where we are
+ searching for a pc.min_max function reference.
+
+ We could try to search for the function reference in a
+ GitHub Apache Arrow repository.
+
+.. figure:: python_tutorial_github_find_in_file.jpeg
+ :scale: 50 %
+ :alt: In the GitHub repository we are searching through the
+ test_compute.py file for the pc.min_max function.
+
+ And search through the ``test_compute.py`` file in ``pyarrow``
+ folder.
+
+From the search we can see that the function is tested in the
+``python/pyarrow/tests/test_compute.py`` file that would mean the
+function is defined in the ``compute.py`` file.
+
+After examining the ``compute.py`` file we can see that together
+with ``_compute.py`` the functions from C++ get wrapped into Python.
+We will define the new feature at the end of the ``compute.py`` file.
+
+Lets run some code in the Python console from ``arrow/python``
+directory in order to learn more about ``pc.min_max``.
+
+.. code:: console
+
+ $ cd python
+ $ python
+
+ Python 3.9.7 (default, Oct 22 2021, 13:24:00)
+ [Clang 13.0.0 (clang-1300.0.29.3)] on darwin
+ Type "help", "copyright", "credits" or "license" for more information.
+
+We have entered into Python console from the shell and we can
+do some research:
+
+.. code-block:: python
+
+ >>> import pyarrow.compute as pc
+ >>> data = [4, 5, 6, None, 1]
+ >>> data
+ [4, 5, 6, None, 1]
+ >>> pc.min_max(data)
+ <pyarrow.StructScalar: [('min', 1), ('max', 6)]>
+ >>> pc.min_max(data, skip_nulls=False)
+ <pyarrow.StructScalar: [('min', None), ('max', None)]>
+
+We will call our new feature ``pc.tutorial_min_max``. We want the
+result from our function, that takes the same input data, to be
+``[('min-', 0), ('max+', 7)]``. If we specify that the null value should be
+included, the result should be equal to ``pc.min_max`` that is
+``[('min', None), ('max', None)]``.
+
+Lets add the first trial code into ``arrow/python/pyarrow/compute.py``
+where we first test the call to the "min_max" function from C++:
+
+.. code-block:: python
+
+ def tutorial_min_max(values, skip_nulls=True):
+ """
+ Add docstrings
+
+ Parameters
+ ----------
+ values : Array
+
+ Returns
+ -------
+ result : TODO
+
+ Examples
+ --------
+ >>> import pyarrow.compute as pc
+ >>> data = [4, 5, 6, None, 1]
+ >>> pc.tutorial_min_max(data)
+ <pyarrow.StructScalar: [('min-', 0), ('max+', 7)]>
+ """
+
+ options = ScalarAggregateOptions(skip_nulls=skip_nulls)
+ return call_function("min_max", [values], options)
+
+To see if this works we will need to import ``pyarrow.compute``
+again and try:
+
+.. code-block:: python
+
+ >>> import pyarrow.compute as pc
+ >>> data = [4, 5, 6, None, 1]
+ >>> pc.tutorial_min_max(data)
+ <pyarrow.StructScalar: [('min', 1), ('max', 6)]>
+
+It’s working. Now we must correct the limits to get the corrected
+interval. To do that we have to do some research on ``pyarrow.StructScalar``.
+In `test_scalars.py
<https://github.com/apache/arrow/blob/994074d2e7ff073301e0959dbc5bb595a1e2a41b/python/pyarrow/tests/test_scalars.py#L547-L553>`_
+under the ``test_struct_duplicate_fields`` we can see an example
+of how the ``StructScalar`` is created. We could again run the
+Python console and try creating one ourselves.
+
+.. code-block:: python
+
+ >>> import pyarrow as pa
+ >>> ty = pa.struct([
+ ... pa.field('min-', pa.int64()),
+ ... pa.field('max+', pa.int64()),
+ ... ])
+ >>> pa.scalar([('min-', 3), ('max+', 9)], type=ty)
+ <pyarrow.StructScalar: [('min-', 3), ('max+', 9)]>
+
+.. note::
+
+ In cases where we don't yet have good documentation, unit tests
+ can be a good place to look for code examples
Review comment:
```suggestion
can be a good place to look for code examples.
```
##########
File path: docs/source/developers/guide/tutorials/python_tutorial.rst
##########
@@ -25,3 +25,532 @@
***************
Python tutorial
***************
+
+In this tutorial we will make an actual feature contribution to
+Arrow following the steps specified by :ref:`quick-ref-guide`
+section of the guide and a more detailed :ref:`step_by_step`
+section. Navigate there whenever there is some information
+you may find is missing here.
+
+The feature contribution will be added to the compute module
+in PyArrow. But you can also follow the steps in case you are
+correcting a bug or adding a binding.
+
+This tutorial is different from the :ref:`step_by_step` as we
+will be working on a specific case. This tutorial is not meant
+as a step by step guide.
+
+**Lets start!**
+
+Set up
+------
+
+Lets setup the Arrow repository. We presume here that Git is
+already installed. Otherwise please see the :ref:`set-up` section.
+
+Once the `Apache Arrow repository <https://github.com/apache/arrow>`_
+is forked we will clone it and add the link of the main repository
+to our upstream.
+
+.. code:: console
+
+ $ git clone https://github.com/<your username>/arrow.git
+ $ cd arrow
+ $ git remote add upstream https://github.com/apache/arrow
+
+Building PyArrow
+----------------
+
+Script for building PyArrow differs depending on the Operating
+System you are using. For this reason we will only refer to
+the instructions for the building process in this tutorial.
+
+.. seealso::
+
+ For the **introduction** to the building process refer to the
+ :ref:`build-arrow-guide` section.
+
+ For the **instructions** on how to build PyArrow refer to the
+ :ref:`build_pyarrow` section.
+
+Create a JIRA issue for the new feature
+---------------------------------------
+
+We will add a new feature that imitates an existing function
+``min_max`` from the ``arrow.compute`` module but makes the
+interval bigger by 1 in both directions. Note that this is a
+made-up function for the purpose of this guide.
+
+See the example of the ``pc.min_max`` in
+`this link
<https://arrow.apache.org/cookbook/py/data.html#computing-mean-min-max-values-of-an-array>`_.
+
+First we need to create a JIRA issue as it doesn't exist yet.
+With a JIRA account created we will navigate to the
+`Apache Arrow JIRA dashboard <https://issues.apache.org/jira/projects/ARROW>`_
+and click on the **Create** button.
+
+.. figure:: python_tutorial_jira_title.jpeg
+ :scale: 70 %
+ :alt: JIRA dashboard with a window for creating a new issue.
+
+ Creating a JIRA issue, adding title (summary) and components.
+
+.. figure:: python_tutorial_jira_description.jpeg
+ :scale: 70 %
+ :alt: JIRA dashboard with a window for creating a
+ description for the new issue.
+
+ Creating a JIRA issue, adding a description.
+
+We will also add some comments to start a conversation.
+
+.. figure:: python_tutorial_jira_comment.jpeg
+ :scale: 50 %
+ :alt: JIRA issue page where comment is being added.
+
+ Adding a comment to the JIRA ticket we created.
+
+We have successfully created a new JIRA issue with index ARROW-14977.
+
+.. figure:: python_tutorial_jira_issue.jpeg
+ :scale: 50 %
+ :alt: JIRA page of the issue just created.
+
+ Our JIRA issue. Yay!
+
+To see the issue in JIRA follow
+`this link <https://issues.apache.org/jira/browse/ARROW-14977>`_.
+
+.. seealso::
+
+ To get more information on JIRA issues go to
+ :ref:`finding-issues` part of the guide.
+
+Start the work on a new branch
+------------------------------
+
+Before we start working on adding the feature we should
+create a new branch from updated master.
+
+.. code:: console
+
+ $ git checkout master
+ $ git fetch upstream
+ $ git pull --ff-only upstream master
+ $ git checkout -b ARROW-14977
+
+Lets research the Arrow library to see where the ``pc.min_max``
+function is defined/connected with the C++ and get an idea
+where we could implement the new feature.
+
+.. figure:: python_tutorial_github_search.jpeg
+ :scale: 50 %
+ :alt: Apache Arrow GitHub repository dashboard where we are
+ searching for a pc.min_max function reference.
+
+ We could try to search for the function reference in a
+ GitHub Apache Arrow repository.
+
+.. figure:: python_tutorial_github_find_in_file.jpeg
+ :scale: 50 %
+ :alt: In the GitHub repository we are searching through the
+ test_compute.py file for the pc.min_max function.
+
+ And search through the ``test_compute.py`` file in ``pyarrow``
+ folder.
+
+From the search we can see that the function is tested in the
+``python/pyarrow/tests/test_compute.py`` file that would mean the
+function is defined in the ``compute.py`` file.
+
+After examining the ``compute.py`` file we can see that together
+with ``_compute.py`` the functions from C++ get wrapped into Python.
+We will define the new feature at the end of the ``compute.py`` file.
+
+Lets run some code in the Python console from ``arrow/python``
+directory in order to learn more about ``pc.min_max``.
+
+.. code:: console
+
+ $ cd python
+ $ python
+
+ Python 3.9.7 (default, Oct 22 2021, 13:24:00)
+ [Clang 13.0.0 (clang-1300.0.29.3)] on darwin
+ Type "help", "copyright", "credits" or "license" for more information.
+
+We have entered into Python console from the shell and we can
Review comment:
```suggestion
We have entered into the Python console from the shell and we can
```
##########
File path: docs/source/developers/guide/tutorials/python_tutorial.rst
##########
@@ -25,3 +25,532 @@
***************
Python tutorial
***************
+
+In this tutorial we will make an actual feature contribution to
+Arrow following the steps specified by :ref:`quick-ref-guide`
+section of the guide and a more detailed :ref:`step_by_step`
+section. Navigate there whenever there is some information
+you may find is missing here.
+
+The feature contribution will be added to the compute module
+in PyArrow. But you can also follow the steps in case you are
+correcting a bug or adding a binding.
+
+This tutorial is different from the :ref:`step_by_step` as we
+will be working on a specific case. This tutorial is not meant
+as a step by step guide.
+
+**Lets start!**
+
+Set up
+------
+
+Lets setup the Arrow repository. We presume here that Git is
+already installed. Otherwise please see the :ref:`set-up` section.
+
+Once the `Apache Arrow repository <https://github.com/apache/arrow>`_
+is forked we will clone it and add the link of the main repository
+to our upstream.
+
+.. code:: console
+
+ $ git clone https://github.com/<your username>/arrow.git
+ $ cd arrow
+ $ git remote add upstream https://github.com/apache/arrow
+
+Building PyArrow
+----------------
+
+Script for building PyArrow differs depending on the Operating
+System you are using. For this reason we will only refer to
+the instructions for the building process in this tutorial.
+
+.. seealso::
+
+ For the **introduction** to the building process refer to the
+ :ref:`build-arrow-guide` section.
+
+ For the **instructions** on how to build PyArrow refer to the
+ :ref:`build_pyarrow` section.
+
+Create a JIRA issue for the new feature
+---------------------------------------
+
+We will add a new feature that imitates an existing function
+``min_max`` from the ``arrow.compute`` module but makes the
+interval bigger by 1 in both directions. Note that this is a
+made-up function for the purpose of this guide.
+
+See the example of the ``pc.min_max`` in
+`this link
<https://arrow.apache.org/cookbook/py/data.html#computing-mean-min-max-values-of-an-array>`_.
+
+First we need to create a JIRA issue as it doesn't exist yet.
+With a JIRA account created we will navigate to the
+`Apache Arrow JIRA dashboard <https://issues.apache.org/jira/projects/ARROW>`_
+and click on the **Create** button.
+
+.. figure:: python_tutorial_jira_title.jpeg
+ :scale: 70 %
+ :alt: JIRA dashboard with a window for creating a new issue.
+
+ Creating a JIRA issue, adding title (summary) and components.
+
+.. figure:: python_tutorial_jira_description.jpeg
+ :scale: 70 %
+ :alt: JIRA dashboard with a window for creating a
+ description for the new issue.
+
+ Creating a JIRA issue, adding a description.
+
+We will also add some comments to start a conversation.
+
+.. figure:: python_tutorial_jira_comment.jpeg
+ :scale: 50 %
+ :alt: JIRA issue page where comment is being added.
+
+ Adding a comment to the JIRA ticket we created.
+
+We have successfully created a new JIRA issue with index ARROW-14977.
+
+.. figure:: python_tutorial_jira_issue.jpeg
+ :scale: 50 %
+ :alt: JIRA page of the issue just created.
+
+ Our JIRA issue. Yay!
+
+To see the issue in JIRA follow
+`this link <https://issues.apache.org/jira/browse/ARROW-14977>`_.
+
+.. seealso::
+
+ To get more information on JIRA issues go to
+ :ref:`finding-issues` part of the guide.
+
+Start the work on a new branch
+------------------------------
+
+Before we start working on adding the feature we should
+create a new branch from updated master.
+
+.. code:: console
+
+ $ git checkout master
+ $ git fetch upstream
+ $ git pull --ff-only upstream master
+ $ git checkout -b ARROW-14977
+
+Lets research the Arrow library to see where the ``pc.min_max``
+function is defined/connected with the C++ and get an idea
+where we could implement the new feature.
+
+.. figure:: python_tutorial_github_search.jpeg
+ :scale: 50 %
+ :alt: Apache Arrow GitHub repository dashboard where we are
+ searching for a pc.min_max function reference.
+
+ We could try to search for the function reference in a
+ GitHub Apache Arrow repository.
+
+.. figure:: python_tutorial_github_find_in_file.jpeg
+ :scale: 50 %
+ :alt: In the GitHub repository we are searching through the
+ test_compute.py file for the pc.min_max function.
+
+ And search through the ``test_compute.py`` file in ``pyarrow``
+ folder.
+
+From the search we can see that the function is tested in the
+``python/pyarrow/tests/test_compute.py`` file that would mean the
+function is defined in the ``compute.py`` file.
+
+After examining the ``compute.py`` file we can see that together
+with ``_compute.py`` the functions from C++ get wrapped into Python.
Review comment:
It's either pyx or pyd, right?
```suggestion
with ``_compute.pyx`` the functions from C++ get wrapped into Python.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]