rok commented on code in PR #47199:
URL: https://github.com/apache/arrow/pull/47199#discussion_r2240170410


##########
python/pyarrow/tests/parquet/common.py:
##########
@@ -121,6 +121,11 @@ def _test_dataframe(size=10000, seed=0):
     return df
 
 
+def _test_table(size=10000, seed=0):
+    df = _test_dataframe(size, seed)
+    return pa.Table.from_pandas(df, preserve_index=False)

Review Comment:
   Doesn't `_test_dataframe` use Pandas? Depending on Pandas would go counter 
the intent [stated here](https://github.com/apache/arrow/issues/47172):
   > This issue would move some of tests using _test_dataframe to use a new 
utility function and remove the @pytest.mark.pandas in this cases.
   
   You could move numpy logic from `_test_dataframe` into `_test_table` and 
have test `_test_dataframe` like:
   
   ```python
   # I've not tested this
   
   def _test_table(size=10000, seed=0):
       np.random.seed(seed)
       return pa.Table({
           'uint8': _random_integers(size, np.uint8),
           'uint16': _random_integers(size, np.uint16),
           'uint32': _random_integers(size, np.uint32),
           'uint64': _random_integers(size, np.uint64),
           'int8': _random_integers(size, np.int8),
           'int16': _random_integers(size, np.int16),
           'int32': _random_integers(size, np.int32),
           'int64': _random_integers(size, np.int64),
           'float32': np.random.randn(size).astype(np.float32),
           'float64': np.arange(size, dtype=np.float64),
           'bool': np.random.randn(size) > 0,
           'strings': [util.rands(10) for i in range(size)],
           'all_none': [None] * size,
           'all_none_category': [None] * size
       )
   
   def _test_dataframe(size=10000, seed=0):
       import pandas as pd
       np.random.seed(seed)
   
       return _test_table(size, seed).to_pandas()
   ```
   
   Possibly out of scope:
   It might even be good to have fallback logic in _test_table for cases numpy 
is not available. This logic could use stdlib's `random` or some testing 
utility we have available in arrow c++.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to