[ https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137591#comment-14137591 ]
Pankit Thapar commented on HIVE-8137: ------------------------------------- I think Tez works in this case because in Tez related code flow, hive creates files for empty tables. I dont know if that would be the right approach for OrcInputFormat. Also, one way to avoid creating split would be to list file status in CombineHiveInputFormat.getSplits() and filter out zero length files. then pass on that list to hadoop. But going this way, we add an O(n) overhead of getting file status. > Empty ORC file handling > ----------------------- > > Key: HIVE-8137 > URL: https://issues.apache.org/jira/browse/HIVE-8137 > Project: Hive > Issue Type: Improvement > Components: File Formats > Affects Versions: 0.13.1 > Reporter: Pankit Thapar > Fix For: 0.14.0 > > > Hive 13 does not handle reading of a zero size Orc File properly. An Orc file > is suposed to have a post-script > which the ReaderIml class tries to read and initialize the footer with it. > But in case, the file is empty > or is of zero size, then it runs into an IndexOutOfBound Exception because of > ReaderImpl trying to read in its constructor. > Code Snippet : > //get length of PostScript > int psLen = buffer.get(readSize - 1) & 0xff; > In the above code, readSize for an empty file is zero. > I see that ensureOrcFooter() method performs some sanity checks for footer , > so, either we can move the above code snippet to ensureOrcFooter() and throw > a "Malformed ORC file exception" or we can create a dummy Reader that does > not initialize footer and basically has hasNext() set to false so that it > returns false on the first call. > Basically, I would like to know what might be the correct way to handle an > empty ORC file in a mapred job? > Should we neglect it and not throw an exception or we can throw an exeption > that the ORC file is malformed. > Please let me know your thoughts on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)