MohamedAdelHsn opened a new issue, #2098: URL: https://github.com/apache/orc/issues/2098
I am facing multiple issues regarding corrupted orc files for hive transactional tables while selecting data from table in some days as tables are partitioned by date they are different issues like empty orc files and something like that exception :-- ERROR [main]: CliDriver (SessionState.java:printError(960)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://l1031lab.sss.se.scania.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimECU/part-m-00000. Invalid postscript. java.io.IOException: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://l1031lab.sss.se.scania.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimECU/part-m-00000. Invalid postscript. at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://l1031lab.sss.se.scania.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimECU/part-m-00000. Invalid postscript. and mutiple types of exception , but till now I couldn't find the RC , how to prevent this issue and reslove it permentally as it's impact our business while querying adhoc queries. what get me surprised that setting hive.exec.orc.skip.corrupt.data=true in hive session before write query doesn't help why it's still reading corrupted orc files after enable this property. Env:- Hive-on TEZ Hive version 3.1.1 Hadoop Version 3.1.1 Cluster size : 45 Nodes Your quick support is highly highly apricated :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
