[ https://issues.apache.org/jira/browse/ARROW-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930052#comment-16930052 ]
Adam Hooper commented on ARROW-6568: ------------------------------------ My workaround, in my function that wraps `pyarrow.parquet.write_table()`: {code:python} if table.num_rows == 0: # Workaround for https://issues.apache.org/jira/browse/ARROW-6568 # If table is zero-length, guarantee it has a RecordBatch so Arrow # won't crash when writing a DictionaryArray. def empty_array_for_field(field): if pyarrow.types.is_dictionary(field.type): return pyarrow.DictionaryArray.from_arrays( pyarrow.array([], type=field.type.index_type), pyarrow.array([], type=field.type.value_type), ) else: return pyarrow.array([], type=field.type) table = pyarrow.table( {field.name: empty_array_for_field(field) for field in table.schema} ) # ... and now `table` is safe to use in `pyarrow.parquet.write_table()`. {code} > pyarrow.parquet crash writing zero-chunk dictionary-type column > --------------------------------------------------------------- > > Key: ARROW-6568 > URL: https://issues.apache.org/jira/browse/ARROW-6568 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.14.1 > Environment: Pyarrow v0.14.1, manylinux1 > Reporter: Adam Hooper > Priority: Major > > Trying to write a zero-RecordBatch file to parquet: > {code:python} > import pyarrow > import pyarrow.parquet > table = pyarrow.Table.from_batches([], pyarrow.schema([('A', > pyarrow.dictionary(pyarrow.int32(), pyarrow.string()))])) > pyarrow.parquet.write_table(table, 'x.parquet') > {code} > ... I receive an error and Python exits with exit code {{139}}: > {noformat} > WARNING: Logging before InitGoogleLogging() is written to STDERR > F0915 18:37:23.099939 1 table.cc:64] Check failed: (chunks.size()) > (0) > cannot construct ChunkedArray from empty vector and omitted type > *** Check failure stack trace: *** > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003)