[ 
https://issues.apache.org/jira/browse/FLINK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225232#comment-17225232
 ] 

Ruben Laguna commented on FLINK-19903:
--------------------------------------

By the way, I started looking at the implementation  CsvInputFormat, so I 
wonder if you are going to implement this soon or you would like me to provide 
a PR? My  current idea is 
 * Implement SupportsReadingMetadata in FileSystemTableSource (delegate to 
CsvInputFormatFactory if possible)
 * Pass metadata fields to the format via the 
FileSystemFormatFactory.ReaderContext()) 
 * Implement in CsvInputFormatFactory. CsvInputFormat.nextRecord() to add the 
path if was passed on the ReaderContext

 

But maybe I'm being naive about how long it will take to implement this and the 
consequences of modifying things like ReaderContext, so before I actually start 
on a PR maybe you can tell me if it's worthy for me to try or you want to do it 
yourself.

> Allow to read metadata in filesystem connector
> ----------------------------------------------
>
>                 Key: FLINK-19903
>                 URL: https://issues.apache.org/jira/browse/FLINK-19903
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / API
>            Reporter: Ruben Laguna
>            Priority: Major
>         Attachments: image-2020-11-03-08-53-03-714.png
>
>
> Use case: 
> I have a dataset where they embedded some information in the filenames
> (200k files) and I need to extract that as a new column.
> In Spark I could `
> .withColumn("id",f.split(f.reverse(f.split(f.input_file_name(),'/'))[0],'\.')[0])`
>  but I don't see how can I do the same with Flink.
>  
> Apparently there is 
> [FLIP-107|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]]
>  which would allow SQL connectors and formats to expose metadata. 
>  
> So it would be great for the Filesystem SQL connector to expose the path. 
> Ideally for me the path could be exposed via a function that read the 
> metadata. So I could write  something akin to `SELECT input_file_name(),* 
> FROM table1`
>  
>  
> [1]: 
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]
> [2]: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-I-get-the-filename-as-a-column-td39096.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to