[GitHub] [arrow] danepitkin commented on a diff in pull request #35396: GH-35390: [Python] Consolidate some APIs in Table and RecordBatch

via GitHub Wed, 10 May 2023 12:18:19 -0700


danepitkin commented on code in PR #35396:
URL: https://github.com/apache/arrow/pull/35396#discussion_r1190306435



##########
python/pyarrow/table.pxi:
##########
@@ -1963,106 +2113,16 @@ cdef class RecordBatch(_Tabular):
         if self._schema is None:
             self._schema = pyarrow_wrap_schema(self.batch.schema())
 
-        return self._schema
-
-    def field(self, i):
-        """
-        Select a schema field by its column name or numeric index
-
-        Parameters
-        ----------
-        i : int or string
-            The index or name of the field to retrieve
-
-        Returns
-        -------
-        pyarrow.Field
-
-        Examples
-        --------
-        >>> import pyarrow as pa
-        >>> n_legs = pa.array([2, 2, 4, 4, 5, 100])
-        >>> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle 
stars", "Centipede"])
-        >>> batch = pa.RecordBatch.from_arrays([n_legs, animals],
-        ...                                     names=["n_legs", "animals"])
-        >>> batch.field(0)
-        pyarrow.Field<n_legs: int64>
-        >>> batch.field(1)
-        pyarrow.Field<animals: string>
-        """
-        return self.schema.field(i)
-
-    @property
-    def columns(self):
-        """
-        List of all columns in numerical order
-
-        Returns
-        -------
-        list of pyarrow.Array
-
-        Examples
-        --------
-        >>> import pyarrow as pa
-        >>> n_legs = pa.array([2, 2, 4, 4, 5, 100])
-        >>> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle 
stars", "Centipede"])
-        >>> batch = pa.RecordBatch.from_arrays([n_legs, animals],
-        ...                                     names=["n_legs", "animals"])
-        >>> batch.columns
-        [<pyarrow.lib.Int64Array object at ...>
-        [
-          2,
-          2,
-          4,
-          4,
-          5,
-          100
-        ], <pyarrow.lib.StringArray object at ...>
-        [
-          "Flamingo",
-          "Parrot",
-          "Dog",
-          "Horse",
-          "Brittle stars",
-          "Centipede"
-        ]]
-        """
-        return [self.column(i) for i in range(self.num_columns)]
-
-    def _ensure_integer_index(self, i):
-        """
-        Ensure integer index (convert string column name to integer if needed).
-        """
-        if isinstance(i, (bytes, str)):
-            field_indices = self.schema.get_all_field_indices(i)
-
-            if len(field_indices) == 0:
-                raise KeyError(
-                    "Field \"{}\" does not exist in record batch schema"
-                    .format(i))
-            elif len(field_indices) > 1:
-                raise KeyError(
-                    "Field \"{}\" exists {} times in record batch schema"
-                    .format(i, len(field_indices)))
-            else:
-                return field_indices[0]
-        elif isinstance(i, int):
-            return i
-        else:
-            raise TypeError("Index must either be string or integer")
+        return self._schema
 
-    def column(self, i):
+    @property
+    def columns(self):

Review Comment:
   Absolutely!



##########
python/pyarrow/table.pxi:
##########
@@ -1500,6 +1627,68 @@ cdef class _Tabular(_PandasConvertible):
         """
         return _pc().drop_null(self)
 
+    def field(self, i):
+        """
+        Select a schema field by its column name or numeric index.
+
+        Parameters
+        ----------
+        i : int or string
+            The index or name of the field to retrieve.
+
+        Returns
+        -------
+        Field
+
+        Examples
+        --------
+        Table (works similarly for RecordBatch)
+
+        >>> import pyarrow as pa
+        >>> import pandas as pd
+        >>> df = pd.DataFrame({'n_legs': [2, 4, 5, 100],
+        ...                    'animals': ["Flamingo", "Horse", "Brittle 
stars", "Centipede"]})
+        >>> table = pa.Table.from_pandas(df)
+        >>> table.field(0)
+        pyarrow.Field<n_legs: int64>
+        >>> table.field(1)
+        pyarrow.Field<animals: string>
+        """
+        return self.schema.field(i)
+
+    @property
+    def schema(self):
+        raise NotImplementedError
+
+    def sort_by(self, sorting, **kwargs):
+        """
+        Sort the Table or RecordBatch by one or multiple columns.
+
+        Parameters
+        ----------
+        sorting : str or list[tuple(name, order)]
+            Name of the column to use to sort (ascending), or
+            a list of multiple sorting conditions where
+            each entry is a tuple with column name
+            and sorting order ("ascending" or "descending")
+        **kwargs : dict, optional
+            Additional sorting options.
+            As allowed by :class:`SortOptions`
+
+        Returns
+        -------
+        Table or RecordBatch
+            A new tabular object sorted according to the sort keys.

Review Comment:
   Good catch! Will do.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] danepitkin commented on a diff in pull request #35396: GH-35390: [Python] Consolidate some APIs in Table and RecordBatch

Reply via email to