[
https://issues.apache.org/jira/browse/FLINK-19903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruben Laguna updated FLINK-19903:
---------------------------------
Description:
Use case:
I have a dataset where they embedded some information in the filenames
(200k files) and I need to extract that as a new column.
In Spark I could `
.withColumn("id",f.split(f.reverse(f.split(f.input_file_name(),'/'))[0],'\.')[0])`
but I don't see how can I do the same with Flink.
Apparently there is
[FLIP-107|https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]
which would allow SQL connectors and formats to expose metadata.
So it would be great for the Filesystem SQL connector to expose the path.
Ideally for me the path could be exposed via a function that read the metadata.
So I could write something akin to `SELECT input_file_name(),* FROM table1`
[1]:
[https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]
[2]:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-I-get-the-filename-as-a-column-td39096.html
was:
Use case:
I have a dataset where they embedded some information in the filenames
(200k files) and I need to extract that as a new column.
In Spark I could `
.withColumn("id",f.split(f.reverse(f.split(f.input_file_name(),'/'))[0],'\.')[0])`
but I don't see how can I do the same with Flink.
Apparently there is
[FLIP-107|https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]]which
would allow SQL connectors and formats to expose metadata.
So it would be great for the Filesystem SQL connector to expose the path.
Ideally for me the path could be exposed via a function that read the metadata.
So I could write something akin to `SELECT input_file_name(),* FROM table1`
[1]:
[https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]
[2]:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-I-get-the-filename-as-a-column-td39096.html
> Allow to read metadata in filesystem connector
> ----------------------------------------------
>
> Key: FLINK-19903
> URL: https://issues.apache.org/jira/browse/FLINK-19903
> Project: Flink
> Issue Type: New Feature
> Components: Connectors / FileSystem, Table SQL / Ecosystem
> Reporter: Ruben Laguna
> Priority: Minor
> Labels: auto-deprioritized-major, pull-request-available
> Attachments: image-2020-11-03-08-53-03-714.png
>
>
> Use case:
> I have a dataset where they embedded some information in the filenames
> (200k files) and I need to extract that as a new column.
> In Spark I could `
> .withColumn("id",f.split(f.reverse(f.split(f.input_file_name(),'/'))[0],'\.')[0])`
> but I don't see how can I do the same with Flink.
>
> Apparently there is
> [FLIP-107|https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]
> which would allow SQL connectors and formats to expose metadata.
>
> So it would be great for the Filesystem SQL connector to expose the path.
> Ideally for me the path could be exposed via a function that read the
> metadata. So I could write something akin to `SELECT input_file_name(),*
> FROM table1`
>
>
> [1]:
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]
> [2]:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-I-get-the-filename-as-a-column-td39096.html
--
This message was sent by Atlassian Jira
(v8.3.4#803005)