This is an automated email from the ASF dual-hosted git repository.

raulcd pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new cd848bcb07 GH-47734: [Python] Fix hypothesis timedelta bounds for 
duration/interval types (#48460)
cd848bcb07 is described below

commit cd848bcb07e1513bb0bc40e82e2ca3466729f223
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Fri Dec 12 22:47:50 2025 +0900

    GH-47734: [Python] Fix hypothesis timedelta bounds for duration/interval 
types (#48460)
    
    ### Rationale for this change
    
    Unbounded hypothesis timedeltas overflow int64 storage when converted to 
duration[ns]; this adds safe bounds like we're doing it for timestamps.
    
    Assuming from the code, I think overflow is happening here:
    
    
https://github.com/apache/arrow/blob/203437b4d6848885de72f32bfb3017919373a736/python/pyarrow/tests/strategies.py#L144
    
    
https://github.com/HypothesisWorks/hypothesis/blob/7288aa8f07f6ba61093b1eac6571d13632f31a54/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py#L347C5-L350
    
    Simple example would be:
    
    ```python
    pa.array([datetime.timedelta.max], type=pa.duration('ns'))
    ```
    
    Disclaimer: I cannot reproduce it in my local so I can't confirm that the 
above is correct. There should be something else but I think it's good to set 
the bounds in any event.
    
    ### What changes are included in this PR?
    
    Explicitly set the bounds for `st.timedeltas` in hypothesis. 90% of the 
capacity when it's a nano second.
    
    ### Are these changes tested?
    
    Passed in https://github.com/apache/arrow/pull/48460#issuecomment-3641542577
    
    ### Are there any user-facing changes?
    
    No, test-only.
    
    * GitHub Issue: #47734
    
    Authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
---
 python/pyarrow/tests/strategies.py | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/python/pyarrow/tests/strategies.py 
b/python/pyarrow/tests/strategies.py
index 5ce5602298..218176dbc5 100644
--- a/python/pyarrow/tests/strategies.py
+++ b/python/pyarrow/tests/strategies.py
@@ -323,9 +323,25 @@ def arrays(draw, type, size=None, nullable=True):
         value = st.datetimes(timezones=st.just(tz), min_value=min_datetime,
                              max_value=max_datetime)
     elif pa.types.is_duration(ty):
-        value = st.timedeltas()
+        if ty.unit in ('s', 'ms'):
+            min_value = datetime.timedelta.min
+            max_value = datetime.timedelta.max
+        elif ty.unit == 'us':
+            max_int64 = 2**63 - 1
+            max_days = max_int64 // (86400 * 10**6)
+            min_value = datetime.timedelta(days=-max_days)
+            max_value = datetime.timedelta(days=max_days)
+        else:  # 'ns'
+            # Empirically tested value
+            min_value = datetime.timedelta(days=-96_075)
+            max_value = datetime.timedelta(days=96_075)
+        value = st.timedeltas(min_value=min_value, max_value=max_value)
     elif pa.types.is_interval(ty):
-        value = st.timedeltas()
+        # Empirically tested value
+        value = st.timedeltas(
+            min_value=datetime.timedelta(days=-96_075),
+            max_value=datetime.timedelta(days=96_075)
+        )
     elif pa.types.is_binary(ty) or pa.types.is_large_binary(ty):
         value = st.binary()
     elif pa.types.is_string(ty) or pa.types.is_large_string(ty):

Reply via email to