[ 
https://issues.apache.org/jira/browse/SPARK-56931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-56931:
-----------------------------------
    Labels: pull-request-available  (was: )

> Support ArrayType/MapType/StructType constant metadata in row materialization 
> path
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-56931
>                 URL: https://issues.apache.org/jira/browse/SPARK-56931
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Matt Zhang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.2.0, 5.0.0
>
>
> Follow-up to SPARK-56844, which enabled ArrayType/MapType/StructType in 
> FileSourceMetadataAttribute and added the populate() branches for 
> ConstantColumnVector. That covered the columnar metadata path (ColumnarBatch 
> output).
> For file scans that produce row output (Batched=false: text, JSON, CSV, or 
> any reader that does not vectorize), the metadata row is filled via 
> FileFormat.updateMetadataInternalRow -> getFileConstantMetadataColumnValue -> 
> Literal(extractor.apply(file)).
> Literal.apply(Any) dispatches on the value class and has no case for 
> ArrayData / MapData / InternalRow, so a complex constant metadata column 
> trips UNSUPPORTED_FEATURE.LITERAL_TYPE before the row is populated.
> Separately, SchemaPruning.sortLeftFieldsByRight recurses through the metadata 
> schema and prunes nested struct fields inside an array/map/struct subfield. 
> That is correct for data files (the reader projects the requested columns) 
> but wrong for constant metadata, where each subfield is produced whole by a 
> single extractor; pruning shaves catalyst row positions out from under the 
> extractor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to