[
https://issues.apache.org/jira/browse/SPARK-56931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-56931:
-----------------------------------
Labels: pull-request-available (was: )
> Support ArrayType/MapType/StructType constant metadata in row materialization
> path
> ----------------------------------------------------------------------------------
>
> Key: SPARK-56931
> URL: https://issues.apache.org/jira/browse/SPARK-56931
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Matt Zhang
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.2.0, 5.0.0
>
>
> Follow-up to SPARK-56844, which enabled ArrayType/MapType/StructType in
> FileSourceMetadataAttribute and added the populate() branches for
> ConstantColumnVector. That covered the columnar metadata path (ColumnarBatch
> output).
> For file scans that produce row output (Batched=false: text, JSON, CSV, or
> any reader that does not vectorize), the metadata row is filled via
> FileFormat.updateMetadataInternalRow -> getFileConstantMetadataColumnValue ->
> Literal(extractor.apply(file)).
> Literal.apply(Any) dispatches on the value class and has no case for
> ArrayData / MapData / InternalRow, so a complex constant metadata column
> trips UNSUPPORTED_FEATURE.LITERAL_TYPE before the row is populated.
> Separately, SchemaPruning.sortLeftFieldsByRight recurses through the metadata
> schema and prunes nested struct fields inside an array/map/struct subfield.
> That is correct for data files (the reader projects the requested columns)
> but wrong for constant metadata, where each subfield is produced whole by a
> single extractor; pruning shaves catalyst row positions out from under the
> extractor.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]