[
https://issues.apache.org/jira/browse/ARROW-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney reopened ARROW-1681:
---------------------------------
> [Python] Error writing with nulls in lists
> ------------------------------------------
>
> Key: ARROW-1681
> URL: https://issues.apache.org/jira/browse/ARROW-1681
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.7.1
> Reporter: Wes McKinney
> Assignee: Wes McKinney
> Labels: pull-request-available
> Fix For: 0.8.0
>
>
> Created from https://github.com/apache/arrow/issues/1208
> Hi,
> Not sure if this is related or the same as ARROW-1584, but I can't seem to
> find a way to handle arrays of lists which occasionally consist of empty
> lists only.
> To reproduce:
> {code}
> na = [] # None, [""]
> arrays = {
> 'c1': pa.array([["test"], na, na], type=pa.list_(pa.string())),
> 'c2': pa.array([na, na, na], type=pa.list_(pa.string())),
> }
> rb = pa.RecordBatch.from_arrays(list(arrays.values()), list(arrays.keys()))
> df = rb.to_pandas()
> pa.serialize_pandas(df)
> # > ArrowNotImplementedError: Unable to convert type: null
> tbl = pa.Table.from_pandas(df)
> sink = pa.BufferOutputStream()
> writer = pa.RecordBatchFileWriter(sink, tbl.schema)
> writer.write_table(tbl)
> # > ArrowNotImplementedError: Unable to convert type: null
> {code}
> In my use case I'm processing data in batches where individual fields contain
> lists of strings. Some of the batches may, however, contain empty lists only.
> And there doesn't seem to be any representation in Arrow at the moment to
> deal with this situation.
> Also, since I'm serializing the batches into a single file/stream, their
> schemas need to be consistent, which is why I tried explicitly specifying the
> type of the array as list_(string). The only workaround I've found is to
> replace empty lists with [""], but that implies lots of unnecessary glue code
> on the client side. Is there a better workaround until this is fixed in an
> official conda release?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)