[
https://issues.apache.org/jira/browse/FLINK-10989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann updated FLINK-10989:
----------------------------------
Description:
The {{OrcRowInputFormat}} seems to use two different {{FileSystem}}. The Flink
{{FileSystem}} for listing the files and generating the {{InputSplits}} and
then Hadoop's {{FileSystem}} to actually read the input splits. This can be
problematic if one only configures Flink's S3 {{FileSystem}} but does not
provide a S3 implementation for Hadoop's {{FileSystem}}.
I think this is not an intuitive behaviour and can lead to hard to debug
problems for a user.
was:The {{OrcRowInputFormat}} seems to use two different {{FileSystem}}. The
Flink {{FileSystem}} for listing the files and generating the {{InputSplits}}
and then Hadoop's {{FileSystem}} to actually read the input splits. This can be
problematic if one only configures Flink's S3 {{FileSystem}} but does not
provide a S3 implementation for Hadoop's {{FileSystem}}.
> OrcRowInputFormat uses two different file systems
> -------------------------------------------------
>
> Key: FLINK-10989
> URL: https://issues.apache.org/jira/browse/FLINK-10989
> Project: Flink
> Issue Type: Bug
> Components: Batch Connectors and Input/Output Formats
> Affects Versions: 1.7.0
> Reporter: Till Rohrmann
> Priority: Major
>
> The {{OrcRowInputFormat}} seems to use two different {{FileSystem}}. The
> Flink {{FileSystem}} for listing the files and generating the {{InputSplits}}
> and then Hadoop's {{FileSystem}} to actually read the input splits. This can
> be problematic if one only configures Flink's S3 {{FileSystem}} but does not
> provide a S3 implementation for Hadoop's {{FileSystem}}.
> I think this is not an intuitive behaviour and can lead to hard to debug
> problems for a user.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)