> > > On a related note, such encoding would address DataFusion's issue of > > representing scalars / constant arrays: a constant array would be > > represented as a repetition. Currently we just unpack (i.e. allocate) a > > constant array when we want to transfer through a RecordBatch. >
In the Julia implementation, we recently merged support <https://github.com/JuliaData/Arrow.jl/pull/156> for more flexible usage of extension types. One use-case that came up was representing the `nothing` Julia value, which is often referred to as the "software engineer's null" as opposed to the `missing` value, which is a propagating "data" null value. Via extension types, we allow treating `nothing` as a "NullKind", which means it is serialized as a null vector, with the extension type "JuliaLang.Nothing", which allows correctly deserializing the null vector as a `Vector{Nothing}` when reading (well, technically a `NullVector{Nothing}`, since it's a custom array type, but hopefully you get the point). Anyway, all that to say that this isn't quite constant arrays, but pretty close. You encode the constant/value in the extension type metadata. This probably isn't a very satisfying approach for intra-language convention, however, since I know extension types are more in the "metadata" realm semantically. Perhaps some generalization of the `Null` type though could be good in the future; like a `Constant` type that has a field for the value, and a field for the type. Then the `NullVector` encoding would be used where we just encode the length, and no actual buffers are required to be serialized/deserialized. -Jacob