Ryan Johnson created SPARK-43226:
------------------------------------
Summary: Define extractors for file-constant metadata columns
Key: SPARK-43226
URL: https://issues.apache.org/jira/browse/SPARK-43226
Project: Spark
Issue Type: New Feature
Components: Spark Core
Affects Versions: 3.4.0
Reporter: Ryan Johnson
File-source constant metadata columns are often derived indirectly from
file-level metadata values rather than exposing those values directly. For
example, {{_metadata.file_name}} is currently hard-coded in
{{FileFormat.updateMetadataInternalRow}} as:
{code:java}
UTF8String.fromString(filePath.getName){code}
We should add support for metadata extractors, functions that map from
{{PartitionedFile}} to {{{}Literal{}}}, so that we can express such columns in
a generic way instead of hard-coding them.
We can't just add them to the metadata map because then they have to be
pre-computed even if it turns out the query does not select that field.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]