[
https://issues.apache.org/jira/browse/SPARK-43226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-43226:
---------------------------------
Target Version/s: (was: 3.5.0)
> Define extractors for file-constant metadata columns
> ----------------------------------------------------
>
> Key: SPARK-43226
> URL: https://issues.apache.org/jira/browse/SPARK-43226
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 3.4.0
> Reporter: Ryan Johnson
> Priority: Major
>
> File-source constant metadata columns are often derived indirectly from
> file-level metadata values rather than exposing those values directly. For
> example, {{_metadata.file_name}} is currently hard-coded in
> {{FileFormat.updateMetadataInternalRow}} as:
>
> {code:java}
> UTF8String.fromString(filePath.getName){code}
>
> We should add support for metadata extractors, functions that map from
> {{PartitionedFile}} to {{{}Literal{}}}, so that we can express such columns
> in a generic way instead of hard-coding them.
> We can't just add them to the metadata map because then they have to be
> pre-computed even if it turns out the query does not select that field.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]