MohamedAdelHsn opened a new issue, #2098:
URL: https://github.com/apache/orc/issues/2098

   I am facing multiple issues regarding corrupted orc files for hive 
transactional tables while selecting data from table in some days as tables are 
partitioned by date they are different issues like empty orc files and 
something like that exception :--
   
   ERROR [main]: CliDriver (SessionState.java:printError(960)) - Failed with 
exception java.io.IOException:org.apache.hadoop.hive.ql.io.FileFormatException: 
Malformed ORC file 
hdfs://l1031lab.sss.se.scania.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimECU/part-m-00000.
 Invalid postscript.
   java.io.IOException: org.apache.hadoop.hive.ql.io.FileFormatException: 
Malformed ORC file 
hdfs://l1031lab.sss.se.scania.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimECU/part-m-00000.
 Invalid postscript.
   at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)
   at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415)
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672)
   at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
   at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
   Caused by: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC 
file 
hdfs://l1031lab.sss.se.scania.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimECU/part-m-00000.
 Invalid postscript.
   
   and mutiple types of exception , but till now I couldn't find the RC , how 
to prevent this issue and reslove it permentally as it's impact our business 
while querying adhoc queries.
   
   what get me surprised that setting hive.exec.orc.skip.corrupt.data=true in 
hive session before write query doesn't help why it's still reading corrupted 
orc files after enable this property.
   
   Env:-
   
   Hive-on TEZ
   Hive version 3.1.1
   Hadoop Version 3.1.1
   
   Cluster size : 45 Nodes
   
   Your quick support is highly highly apricated :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to