[
https://issues.apache.org/jira/browse/ARROW-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468106#comment-17468106
]
David Li commented on ARROW-15237:
----------------------------------
Agreed, the coupling would be annoying. (I think I've seen projects that
include a link to a custom URL shortener or something in the error messages,
but that's a lot of setup.)
As for whether there's a valid case: I can't quite imagine one where you would
do that instead of just dropping the column entirely. Unless you really only
need the length (and not even validity) for some reason?
> [C++] Add cast to Null from any type?
> -------------------------------------
>
> Key: ARROW-15237
> URL: https://issues.apache.org/jira/browse/ARROW-15237
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Weston Pace
> Priority: Major
>
> The "cannot cast to null" error came up during a dataset operation in this SO
> question:
> https://stackoverflow.com/questions/70566660/parquet-with-null-columns-on-pyarrow/70568419#70568419
> Although I suspect casting to null is generally a sign that the user is doing
> something wrong (why throw away data?) there may be some corner cases where
> it is desired and it may be nice just for consistency.
> Simple reproduction (admittedly, the best answer here would probably be to
> use the schema from tab2):
> {code}
> import os
> import pyarrow as pa
> import pyarrow.dataset as ds
> import pyarrow.parquet as pq
> tab = pa.Table.from_pydict({'x': [1, 2, 3], 'y': [None, None, None]})
> tab2 = pa.Table.from_pydict({'x': [4, 5, 6], 'y': ['x', 'y', 'z']})
> os.makedirs('/tmp/null_first_dataset', exist_ok=True)
> pq.write_table(tab, '/tmp/null_first_dataset/0.parquet')
> pq.write_table(tab2, '/tmp/null_first_dataset/1.parquet')
> dataset = ds.dataset('/tmp/null_first_dataset')
> tab = dataset.to_table()
> print(tab)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)