honno opened a new issue, #34774: URL: https://github.com/apache/arrow/issues/34774
### Describe the bug, including details regarding any error messages, version, and platform. It seems the interchange `Column.null_count()` ([relevant spec](https://github.com/data-apis/dataframe-api/blob/d10a096ccc1612ec4acf8503aff58b3acb4e3738/protocol/dataframe_protocol.py#L311-L313)) has erroneous behaviour ```python >>> import pyarrow as pa >>> pa.__version__ '12.0.0.dev304' # from https://pypi.fury.io/arrow-nightlies/ >>> df = pa.table([pa.array([float("nan")], type=pa.float64())], ["foo"]) >>> dfi = df.__dataframe__() >>> col = dfi.get_column(0) >>> col.null_count 0 # should be 1 ``` I assume this is because Arrow does not treat NaNs as nulls, which semantically makes sense, but in the interchange protocol it should—see https://github.com/vaexio/vaex/issues/2120 for a related discussion. See pandas for expected behaviour ```python >>> import pandas as pa >>> df = pd.DataFrame({"foo": [float("nan")]}) >>> dfi = df.__dataframe__() >>> col = dfi.get_column(0) >>> col.null_count 1 ``` cc @AlenkaF (let me know if not to tag you on things! coincidentally I was working on https://github.com/data-apis/dataframe-interchange-tests/issues/20 today when Ralf commented heh.) ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
