Hi, I am trying to use orcsourcetable to fetch data stored in hive tables on hdfs. I am able to use the orcsourcetable to fetch the data and deserialize on local cluster.
But when I am trying to use the hdfs path, it is throwing me file not found error. Any help will be appreciated on the topic. Versions: Flink: 1.7.1 Hive: 2.3.4 *Code snippet:* import org.apache.flink.api.java.DataSet; import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.configuration.Configuration; import org.apache.flink.core.fs.FileSystem; import org.apache.flink.orc.OrcTableSource; import org.apache.flink.table.api.java.BatchTableEnvironment; import org.apache.flink.table.api.Table; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.types.Row; final ExecutionEnvironment environment = ExecutionEnvironment .getExecutionEnvironment(); BatchTableEnvironment tableEnvironment = TableEnvironment.getTableEnvironment(environment); OrcTableSource orcTS = OrcTableSource.builder() .path("hdfs://host:port/logs/sa_structured_events") .forOrcSchema(new OrcSchemaProvider().getStructuredEventsSchema()) .build(); tableEnvironment.registerTableSource("OrcTable", orcTS); Table result = tableEnvironment.sqlQuery("SELECT * FROM OrcTable"); DataSet<Row> rowDataSet = tableEnvironment.toDataSet(result, Row.class); tableEnvironment.execEnv().execute(); *Error:* 2019-10-14 16:56:26,048 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - DataSource (OrcFile[path=hdfs://host:port/logs/sa_structured_events, schema=struct<customerid:string,eventid:string,subtype:st) (1/1) (9e1ad40a0f0b80ef0ad8d3b2fc58816d) switched from RUNNING to FAILED. java.io.FileNotFoundException: File /logs/sa_structured_events/part-00000-b2562d39-1097-490c-99dd-672ed12bbb10-c000.snappy.orc does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787) at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:517) at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:364) at org.apache.orc.OrcFile.createReader(OrcFile.java:251) at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:225) at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:63) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) at java.lang.Thread.run(Unknown Source) 2019-10-14 16:56:26,048 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Java Job at Mon Oct 14 16:56:07 IST 2019 (26a54fbcbd46cd0c4796e7308a2ba3b0) switched from state RUNNING to FAILING. java.io.FileNotFoundException: File /logs/sa_structured_events/part-00000-b2562d39-1097-490c-99dd-672ed12bbb10-c000.snappy.orc does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787) at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:517) at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:364) at org.apache.orc.OrcFile.createReader(OrcFile.java:251) at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:225) at org.apache.flink.orc.OrcRowInputFormat.open(OrcRowInputFormat.java:63) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:170) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) at java.lang.Thread.run(Unknown Source) Regards, Pritam.