Actually, the metadata is reasonable, it is just that there is an array above that column that doesn't have any elements.
So the tree down to column 36 looks like: column 0: (struct) count: 42692 column 1: data (struct) count: 42692 column 21: listingAssociated (array) count: 42692 column 22: (struct) count: 0 column 32: sla (array) count: 0 column 33: (struct) count: 0 column 34: shippingTier (struct) count: 0 column 35: charge (struct) count: 0 column 36: amount (double) count: 0 since there are 0 instances of column 22, there aren't any instances below that. So what should be happening is that the reader doesn't call down to read the data because there are no values. Which version of ORC are you using to read with? Thanks, Owen On Mon, Dec 18, 2017 at 5:38 AM, Piyush Mukati <[email protected]> wrote: > Hi, > I have written one orc file with map-reduce job. But while reading the > file I am getting "read past EOF for a double column". > After debugging I found that we are trying to read an empty stream. I am > suspecting the file meta to be corrupt. > > as the column meta says: > *Column 36: count: 0 hasNull: false sum: 0.0* > I am not able to understand how hasNull=false and count can be zero. > while other columns have non zero counts. > > I am out of ideas on debugging. Please help me with the direction I > should debug further. > please find attached meta and the stackTarace. > Thanks. >
