Greetings Apache Dev Mailing List I'm interested in adding complex number support to Arrow. The use case is Radio Astronomy data, which is represented by complex values.
xref https://issues.apache.org/jira/browse/ARROW-638 xref https://github.com/apache/arrow/pull/10452 It's fairly easy to support Complex Numbers as a Python Extension -- see for e.g. how I've done it here using a list(float{32,64}): https://github.com/ska-sa/dask-ms/blob/a5bd8538ea3de9fabb8fe74e89c3a75c4043f813/daskms/experimental/arrow/extension_types.py#L144-L173 The above seems to work with the standard NumPy complex memory layout (consecutive pairs of [real, imag] values) and should work with the C++ std::complex layout. Note that C complex and C++ std::complex should also have the same layout https://stackoverflow.com/a/10540346. However, this constrains this representation of Complex Numbers to the dask-ms only. I think that it would be better to add support for this at a base level in Arrow, especially since this will open up the ability for other packages to understand the Complex Number Type. For example, it would be useful to: 1. Have a clearly defined Pandas -> Arrow -> Parquet -> Arrow -> Pandas roundtrip. Currently there's no Pandas -> Arrow conversion for np.complex{64, 128}. 2. Support complex number types in query engines like DataFusion and BlazingSQL, if only initially via selection on indexing columns. I started up a PR in https://github.com/apache/arrow/pull/10452 adding Complex Numbers as a first-class Arrow type, although I note that https://issues.apache.org/jira/browse/ARROW-638?focusedCommentId=16912456&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16912456 suggests implementing this as a C++ Extension Type on a first pass. Initial experiments suggests this is pretty doable -- I've got some test cases running already. I have some questions going forward: - Adding first class complex types seems to involve modifying cpp/src/arrow/ipc/feather.fbs which may change the protocol and introduce breaking changes. I'm not sure about this and seek advice on how invasive this approach is and whether its worth pursuing. - list(float{32,64}) seems to work fine as an ExtensionType, but I'd imagine a struct([real, imag]) might offer more in terms of affordance ot the user. I'd imagine the underlying memory layout would be the same. - I don't have a clear understanding of whether adding either a First-Class or ExtensionType involves supporting numeric operations on that type (e.g. Complex Exponential, Absolutes, Min or Max operations) or whether Arrow is merely concerned with the underlying data representation. Thanks for considering this. Simon Perkins