This is an automated email from the ASF dual-hosted git repository.
raulcd pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new cd848bcb07 GH-47734: [Python] Fix hypothesis timedelta bounds for
duration/interval types (#48460)
cd848bcb07 is described below
commit cd848bcb07e1513bb0bc40e82e2ca3466729f223
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Fri Dec 12 22:47:50 2025 +0900
GH-47734: [Python] Fix hypothesis timedelta bounds for duration/interval
types (#48460)
### Rationale for this change
Unbounded hypothesis timedeltas overflow int64 storage when converted to
duration[ns]; this adds safe bounds like we're doing it for timestamps.
Assuming from the code, I think overflow is happening here:
https://github.com/apache/arrow/blob/203437b4d6848885de72f32bfb3017919373a736/python/pyarrow/tests/strategies.py#L144
https://github.com/HypothesisWorks/hypothesis/blob/7288aa8f07f6ba61093b1eac6571d13632f31a54/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py#L347C5-L350
Simple example would be:
```python
pa.array([datetime.timedelta.max], type=pa.duration('ns'))
```
Disclaimer: I cannot reproduce it in my local so I can't confirm that the
above is correct. There should be something else but I think it's good to set
the bounds in any event.
### What changes are included in this PR?
Explicitly set the bounds for `st.timedeltas` in hypothesis. 90% of the
capacity when it's a nano second.
### Are these changes tested?
Passed in https://github.com/apache/arrow/pull/48460#issuecomment-3641542577
### Are there any user-facing changes?
No, test-only.
* GitHub Issue: #47734
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
---
python/pyarrow/tests/strategies.py | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/python/pyarrow/tests/strategies.py
b/python/pyarrow/tests/strategies.py
index 5ce5602298..218176dbc5 100644
--- a/python/pyarrow/tests/strategies.py
+++ b/python/pyarrow/tests/strategies.py
@@ -323,9 +323,25 @@ def arrays(draw, type, size=None, nullable=True):
value = st.datetimes(timezones=st.just(tz), min_value=min_datetime,
max_value=max_datetime)
elif pa.types.is_duration(ty):
- value = st.timedeltas()
+ if ty.unit in ('s', 'ms'):
+ min_value = datetime.timedelta.min
+ max_value = datetime.timedelta.max
+ elif ty.unit == 'us':
+ max_int64 = 2**63 - 1
+ max_days = max_int64 // (86400 * 10**6)
+ min_value = datetime.timedelta(days=-max_days)
+ max_value = datetime.timedelta(days=max_days)
+ else: # 'ns'
+ # Empirically tested value
+ min_value = datetime.timedelta(days=-96_075)
+ max_value = datetime.timedelta(days=96_075)
+ value = st.timedeltas(min_value=min_value, max_value=max_value)
elif pa.types.is_interval(ty):
- value = st.timedeltas()
+ # Empirically tested value
+ value = st.timedeltas(
+ min_value=datetime.timedelta(days=-96_075),
+ max_value=datetime.timedelta(days=96_075)
+ )
elif pa.types.is_binary(ty) or pa.types.is_large_binary(ty):
value = st.binary()
elif pa.types.is_string(ty) or pa.types.is_large_string(ty):