Leo Meyerovich created ARROW-4131: ------------------------------------- Summary: [Python] Coerce mixed columns to String Key: ARROW-4131 URL: https://issues.apache.org/jira/browse/ARROW-4131 Project: Apache Arrow Issue Type: Improvement Reporter: Leo Meyerovich
Continuing [https://github.com/apache/arrow/issues/3280] === I'm seeing variants of this elsewhere (e.g., [wesm/feather#349|https://github.com/wesm/feather/issues/349] ) -- Not all Pandas tables coerce to Arrow tables, and when they fail, not in a way that is conducive to automation: Sample: {{mixed_df = pd.DataFrame(\{'mixed': [1, 'b']}) pa.Table.from_pandas(mixed_df) => ArrowInvalid: ('Could not convert b with type str: tried to convert to double', 'Conversion failed for column mixed with type object') }} I would have expected behaviors more like the following: * Coerce {{toString}} by default, with a default-off option to disallow toString coercions * Provide a default-off option to {{from_pandas}} to auto-coerce * Name the exception so it is clear that this is a column coercion failure, and include the column name(s), making this predictable and clearly handleable by both library writers & users I lean towards: * Defaults auto-coerce, improving life of early users, `coerce_mixed_columns_to_strings=True` * For less frequent yet more advanced library implementors, allow them to override to `False` * In their case, create a predictable & machine-readable exception, `MixedColumnException(mixed_columns=['a', 'b', ...], msg="....")` -- This message was sent by Atlassian JIRA (v7.6.3#76005)