I've filed this as https://issues.apache.org/jira/browse/ORC-285 . Sorry for the delay in getting the fix out.
.. Owen On Mon, Dec 18, 2017 at 10:27 AM, Owen O'Malley <[email protected]> wrote: > This is a bug. Please file a jira. It looks like a change went in that > made the DoubleTreeReader fail if it is called on a batch of size 0. > > Thanks, > Owen > > On Mon, Dec 18, 2017 at 10:19 AM, Owen O'Malley <[email protected]> > wrote: > >> Actually, the metadata is reasonable, it is just that there is an array >> above that column that doesn't have any elements. >> >> So the tree down to column 36 looks like: >> >> column 0: (struct) count: 42692 >> column 1: data (struct) count: 42692 >> column 21: listingAssociated (array) count: 42692 >> column 22: (struct) count: 0 >> column 32: sla (array) count: 0 >> column 33: (struct) count: 0 >> column 34: shippingTier (struct) count: 0 >> column 35: charge (struct) count: 0 >> column 36: amount (double) count: 0 >> >> since there are 0 instances of column 22, there aren't any instances >> below that. So what should be happening is that the reader doesn't call >> down to read the data because there are no values. >> >> Which version of ORC are you using to read with? >> >> Thanks, >> Owen >> >> >> On Mon, Dec 18, 2017 at 5:38 AM, Piyush Mukati <[email protected]> >> wrote: >> >>> Hi, >>> I have written one orc file with map-reduce job. But while reading the >>> file I am getting "read past EOF for a double column". >>> After debugging I found that we are trying to read an empty stream. I am >>> suspecting the file meta to be corrupt. >>> >>> as the column meta says: >>> *Column 36: count: 0 hasNull: false sum: 0.0* >>> I am not able to understand how hasNull=false and count can be zero. >>> while other columns have non zero counts. >>> >>> I am out of ideas on debugging. Please help me with the direction I >>> should debug further. >>> please find attached meta and the stackTarace. >>> Thanks. >>> >> >> >
