This is an automated email from the ASF dual-hosted git repository.
brycemecum pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 2bbd67dc42 MINOR: [Docs][Python] Document type aliasing in
pa.field/pa.schema (#44512)
2bbd67dc42 is described below
commit 2bbd67dc42c64bb5185b6b9147b777d9230e76e9
Author: Bryce Mecum <[email protected]>
AuthorDate: Wed Oct 23 13:20:56 2024 -0700
MINOR: [Docs][Python] Document type aliasing in pa.field/pa.schema (#44512)
### Rationale for this change
PyArrow supports a set of type aliases, e.g., "string" aliases to
pa.string() and these type aliases are triggered in calls to `pa.field` and
`pa.schema`. Prior to this change, these weren't documented.
Note: I didn't think we wanted to deprecate these but if any reviewers want
to discuss that let me know. The R package doesn't support a similar aliasing
mechanism.
### What changes are included in this PR?
Updates to docs. One regression test.
### Are these changes tested?
Yes.
### Are there any user-facing changes?
Better docs.
Authored-by: Bryce Mecum <[email protected]>
Signed-off-by: Bryce Mecum <[email protected]>
---
python/pyarrow/tests/test_types.py | 7 +++++++
python/pyarrow/types.pxi | 28 ++++++++++++++++++++++++++--
2 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/python/pyarrow/tests/test_types.py
b/python/pyarrow/tests/test_types.py
index 2a05f87615..fef350d5de 100644
--- a/python/pyarrow/tests/test_types.py
+++ b/python/pyarrow/tests/test_types.py
@@ -1153,6 +1153,13 @@ def test_field_basic():
pa.field('foo', None)
+def test_field_datatype_alias():
+ f = pa.field('foo', 'string')
+
+ assert f.name == 'foo'
+ assert f.type is pa.string()
+
+
def test_field_equals():
meta1 = {b'foo': b'bar'}
meta2 = {b'bizz': b'bazz'}
diff --git a/python/pyarrow/types.pxi b/python/pyarrow/types.pxi
index c66ac5f28d..4aa8238556 100644
--- a/python/pyarrow/types.pxi
+++ b/python/pyarrow/types.pxi
@@ -3713,8 +3713,8 @@ def field(name, type=None, nullable=None, metadata=None):
Name of the field.
Alternatively, you can also pass an object that implements the Arrow
PyCapsule Protocol for schemas (has an ``__arrow_c_schema__`` method).
- type : pyarrow.DataType
- Arrow datatype of the field.
+ type : pyarrow.DataType or str
+ Arrow datatype of the field or a string matching one.
nullable : bool, default True
Whether the field's values are nullable.
metadata : dict, default None
@@ -3746,6 +3746,11 @@ def field(name, type=None, nullable=None, metadata=None):
>>> pa.struct([field])
StructType(struct<key: int32>)
+
+ A str can also be passed for the type parameter:
+
+ >>> pa.field('key', 'int32')
+ pyarrow.Field<key: int32>
"""
if hasattr(name, "__arrow_c_schema__"):
if type is not None:
@@ -5717,6 +5722,25 @@ def schema(fields, metadata=None):
some_int: int32
some_string: string
+ DataTypes can also be passed as strings. The following is equivalent to the
+ above example:
+
+ >>> pa.schema([
+ ... pa.field('some_int', "int32"),
+ ... pa.field('some_string', "string")
+ ... ])
+ some_int: int32
+ some_string: string
+
+ Or more concisely:
+
+ >>> pa.schema([
+ ... ('some_int', "int32"),
+ ... ('some_string', "string")
+ ... ])
+ some_int: int32
+ some_string: string
+
Returns
-------
schema : pyarrow.Schema