[
https://issues.apache.org/jira/browse/FLINK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225232#comment-17225232
]
Ruben Laguna commented on FLINK-19903:
--------------------------------------
By the way, I started looking at the implementation CsvInputFormat, so I
wonder if you are going to implement this soon or you would like me to provide
a PR? My current idea is
* Implement SupportsReadingMetadata in FileSystemTableSource (delegate to
CsvInputFormatFactory if possible)
* Pass metadata fields to the format via the
FileSystemFormatFactory.ReaderContext())
* Implement in CsvInputFormatFactory. CsvInputFormat.nextRecord() to add the
path if was passed on the ReaderContext
But maybe I'm being naive about how long it will take to implement this and the
consequences of modifying things like ReaderContext, so before I actually start
on a PR maybe you can tell me if it's worthy for me to try or you want to do it
yourself.
> Allow to read metadata in filesystem connector
> ----------------------------------------------
>
> Key: FLINK-19903
> URL: https://issues.apache.org/jira/browse/FLINK-19903
> Project: Flink
> Issue Type: Sub-task
> Components: Table SQL / API
> Reporter: Ruben Laguna
> Priority: Major
> Attachments: image-2020-11-03-08-53-03-714.png
>
>
> Use case:
> I have a dataset where they embedded some information in the filenames
> (200k files) and I need to extract that as a new column.
> In Spark I could `
> .withColumn("id",f.split(f.reverse(f.split(f.input_file_name(),'/'))[0],'\.')[0])`
> but I don't see how can I do the same with Flink.
>
> Apparently there is
> [FLIP-107|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]]
> which would allow SQL connectors and formats to expose metadata.
>
> So it would be great for the Filesystem SQL connector to expose the path.
> Ideally for me the path could be exposed via a function that read the
> metadata. So I could write something akin to `SELECT input_file_name(),*
> FROM table1`
>
>
> [1]:
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]
> [2]:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-I-get-the-filename-as-a-column-td39096.html
--
This message was sent by Atlassian Jira
(v8.3.4#803005)