[jira] [Commented] (ARROW-15237) [C++] Add cast to Null from any type?

David Li (Jira) Mon, 03 Jan 2022 09:10:06 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17468106#comment-17468106
 ]


David Li commented on ARROW-15237:
----------------------------------

Agreed, the coupling would be annoying. (I think I've seen projects that 
include a link to a custom URL shortener or something in the error messages, 
but that's a lot of setup.)

As for whether there's a valid case: I can't quite imagine one where you would 
do that instead of just dropping the column entirely. Unless you really only 
need the length (and not even validity) for some reason?

> [C++] Add cast to Null from any type?
> -------------------------------------
>
>                 Key: ARROW-15237
>                 URL: https://issues.apache.org/jira/browse/ARROW-15237
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> The "cannot cast to null" error came up during a dataset operation in this SO 
> question: 
> https://stackoverflow.com/questions/70566660/parquet-with-null-columns-on-pyarrow/70568419#70568419
> Although I suspect casting to null is generally a sign that the user is doing 
> something wrong (why throw away data?) there may be some corner cases where 
> it is desired and it may be nice just for consistency.
> Simple reproduction (admittedly, the best answer here would probably be to 
> use the schema from tab2):
> {code}
> import os
> import pyarrow as pa
> import pyarrow.dataset as ds
> import pyarrow.parquet as pq
> tab = pa.Table.from_pydict({'x': [1, 2, 3], 'y': [None, None, None]})
> tab2 = pa.Table.from_pydict({'x': [4, 5, 6], 'y': ['x', 'y', 'z']})
> os.makedirs('/tmp/null_first_dataset', exist_ok=True)
> pq.write_table(tab, '/tmp/null_first_dataset/0.parquet')
> pq.write_table(tab2, '/tmp/null_first_dataset/1.parquet')
> dataset = ds.dataset('/tmp/null_first_dataset')
> tab = dataset.to_table()
> print(tab)
> {code} 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-15237) [C++] Add cast to Null from any type?

Reply via email to