[
https://issues.apache.org/jira/browse/FLINK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225486#comment-17225486
]
Timo Walther commented on FLINK-19903:
--------------------------------------
Let me loop in [~lzljs3620320] because I'm not sure what our plans are with the
connectors. I guess at some point `CsvInputFormat` will be replaced by the new
source interfaces. But if this is far in the future, we can make metadata
available sooner.
I think it is a good idea to already think about other useful metadata, than
just the filename. Thanks for collecting a set of metadata attributes. In the
end, those attributes must be available through Flink's {{FileSystem}}
abstraction.
> Allow to read metadata in filesystem connector
> ----------------------------------------------
>
> Key: FLINK-19903
> URL: https://issues.apache.org/jira/browse/FLINK-19903
> Project: Flink
> Issue Type: Sub-task
> Components: Table SQL / API
> Reporter: Ruben Laguna
> Priority: Major
> Attachments: image-2020-11-03-08-53-03-714.png
>
>
> Use case:
> I have a dataset where they embedded some information in the filenames
> (200k files) and I need to extract that as a new column.
> In Spark I could `
> .withColumn("id",f.split(f.reverse(f.split(f.input_file_name(),'/'))[0],'\.')[0])`
> but I don't see how can I do the same with Flink.
>
> Apparently there is
> [FLIP-107|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]]
> which would allow SQL connectors and formats to expose metadata.
>
> So it would be great for the Filesystem SQL connector to expose the path.
> Ideally for me the path could be exposed via a function that read the
> metadata. So I could write something akin to `SELECT input_file_name(),*
> FROM table1`
>
>
> [1]:
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]
> [2]:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-I-get-the-filename-as-a-column-td39096.html
--
This message was sent by Atlassian Jira
(v8.3.4#803005)