Matt Zhang created SPARK-56844:
----------------------------------
Summary: Support ArrayType / MapType / StructType in
ConstantColumnVector and FileSourceMetadataAttribute
Key: SPARK-56844
URL: https://issues.apache.org/jira/browse/SPARK-56844
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.2.0
Reporter: Matt Zhang
`FileSourceMetadataAttribute.isSupportedType` currently rejects ArrayType,
MapType, and StructType, citing `ColumnVectorUtils.populate` as the limiting
factor. As a result, file source constant metadata columns cannot use complex
types, even though all the underlying machinery (`RowToColumnConverter`,
`OffHeapColumnVector` array/map layout) supports them.
This issue tracks broadening that gate, implementing the missing populate and
scatter paths in `ColumnVectorUtils.populate` and
`ConstantColumnVector.writeToOffHeapColumnVector`, and enabling complex
constants end to end.
Specifically:
- `FileSourceMetadataAttribute.isSupportedType` allows array/map/struct
recursively, contingent on element types also being supported.
- `ColumnVectorUtils.populate` gains struct/array/map branches. Array and map
allocate a one-row off-heap backing vector and reuse the existing
`RowToColumnConverter` to write the constant value with full recursive type
support.
- `ConstantColumnVector` gains optional ownership of a backing
`WritableColumnVector` (closed by `close()`), and `writeToOffHeapColumnVector`
gains array/map branches that copy the M constant elements once into the target
child vector and write per-row `(offset=0, length=M)`.
No user-facing API changes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]