hi folks, For some time now I have been uncertain about the utility provided by the arrow::Column C++ class. Fundamentally, it is a container for two things:
* An arrow::Field object (name and data type) * An arrow::ChunkedArray object for the data It was added to the C++ library in ARROW-23 in March 2016 as the basis for the arrow::Table class which represents a collection of ChunkedArray objects coming usually from multiple RecordBatches. Sometimes a Table will have mostly columns with a single chunk while some columns will have many chunks. I'm concerned about continuing to maintain the Column class as it's spilling complexity into computational libraries and bindings alike. The Python Column class for example mostly forwards method calls to the underlying ChunkedArray https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L355 If the developer wants to construct a Table or insert a new "column", Column objects must generally be constructed, leading to boilerplate without clear benefit. Since we're discussing building a more significant higher-level DataFrame interface per past mailing list discussions, my preference would be to consider removing the Column class to make the user- and developer-facing data structures simpler. I hate to propose breaking API changes, so it may not be practical at this point, but I wanted to at least bring up the issue to see if others have opinions after working with the library for a few years. Thanks Wes