This is a bug. Please file a jira. It looks like a change went in that made the DoubleTreeReader fail if it is called on a batch of size 0.
Thanks, Owen On Mon, Dec 18, 2017 at 10:19 AM, Owen O'Malley <[email protected]> wrote: > Actually, the metadata is reasonable, it is just that there is an array > above that column that doesn't have any elements. > > So the tree down to column 36 looks like: > > column 0: (struct) count: 42692 > column 1: data (struct) count: 42692 > column 21: listingAssociated (array) count: 42692 > column 22: (struct) count: 0 > column 32: sla (array) count: 0 > column 33: (struct) count: 0 > column 34: shippingTier (struct) count: 0 > column 35: charge (struct) count: 0 > column 36: amount (double) count: 0 > > since there are 0 instances of column 22, there aren't any instances below > that. So what should be happening is that the reader doesn't call down to > read the data because there are no values. > > Which version of ORC are you using to read with? > > Thanks, > Owen > > > On Mon, Dec 18, 2017 at 5:38 AM, Piyush Mukati <[email protected]> > wrote: > >> Hi, >> I have written one orc file with map-reduce job. But while reading the >> file I am getting "read past EOF for a double column". >> After debugging I found that we are trying to read an empty stream. I am >> suspecting the file meta to be corrupt. >> >> as the column meta says: >> *Column 36: count: 0 hasNull: false sum: 0.0* >> I am not able to understand how hasNull=false and count can be zero. >> while other columns have non zero counts. >> >> I am out of ideas on debugging. Please help me with the direction I >> should debug further. >> please find attached meta and the stackTarace. >> Thanks. >> > >
