>
> > On a related note, such encoding would address DataFusion's issue of
> > representing scalars / constant arrays: a constant array would be
> > represented as a repetition. Currently we just unpack (i.e. allocate) a
> > constant array when we want to transfer through a RecordBatch.
>

In the Julia implementation, we recently merged support
<https://github.com/JuliaData/Arrow.jl/pull/156> for more flexible usage of
extension types. One use-case that came up was representing the `nothing`
Julia value, which is often referred to as the "software engineer's null"
as opposed to the `missing` value, which is a propagating "data" null
value. Via extension types, we allow treating `nothing` as a "NullKind",
which means it is serialized as a null vector, with the extension type
"JuliaLang.Nothing", which allows correctly deserializing the null vector
as a `Vector{Nothing}` when reading (well, technically a
`NullVector{Nothing}`, since it's a custom array type, but hopefully you
get the point).

Anyway, all that to say that this isn't quite constant arrays, but pretty
close. You encode the constant/value in the extension type metadata. This
probably isn't a very satisfying approach for intra-language convention,
however, since I know extension types are more in the "metadata" realm
semantically.

Perhaps some generalization of the `Null` type though could be good in the
future; like a `Constant` type that has a field for the value, and a field
for the type. Then the `NullVector` encoding would be used where we just
encode the length, and no actual buffers are required to be
serialized/deserialized.

-Jacob

Reply via email to