[
https://issues.apache.org/jira/browse/ARROW-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468104#comment-17468104
]
Weston Pace commented on ARROW-15237:
-------------------------------------
Right, I agree this doesn't help much with the SO question. We would still get
the question "I read this dataset and my entire column is null" and I wouldn't
really bother trying to detect this type of situation to give a better error
message.
I suppose I was wondering if there would ever be a valid case for a cast to
null in a query.
As for linking to docs in the error messages, my gut reaction is "don't do
that" just to avoid introducing the coupling (e.g. in case we ever change URL
schemes) but I probably wouldn't oppose it.
> [C++] Add cast to Null from any type?
> -------------------------------------
>
> Key: ARROW-15237
> URL: https://issues.apache.org/jira/browse/ARROW-15237
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Priority: Major
>
> The "cannot cast to null" error came up during a dataset operation in this SO
> question:
> https://stackoverflow.com/questions/70566660/parquet-with-null-columns-on-pyarrow/70568419#70568419
> Although I suspect casting to null is generally a sign that the user is doing
> something wrong (why throw away data?) there may be some corner cases where
> it is desired and it may be nice just for consistency.
> Simple reproduction (admittedly, the best answer here would probably be to
> use the schema from tab2):
> {code}
> import os
> import pyarrow as pa
> import pyarrow.dataset as ds
> import pyarrow.parquet as pq
> tab = pa.Table.from_pydict({'x': [1, 2, 3], 'y': [None, None, None]})
> tab2 = pa.Table.from_pydict({'x': [4, 5, 6], 'y': ['x', 'y', 'z']})
> os.makedirs('/tmp/null_first_dataset', exist_ok=True)
> pq.write_table(tab, '/tmp/null_first_dataset/0.parquet')
> pq.write_table(tab2, '/tmp/null_first_dataset/1.parquet')
> dataset = ds.dataset('/tmp/null_first_dataset')
> tab = dataset.to_table()
> print(tab)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)