Hi All

The Excel file formats (.xls and .xlsx) are somewhat sparse formats, and where a cell has never been used it generally doesn't get written to the file. (Being a Microsoft format, there are exceptions to this...). Currently, if you parse a file with cells at A1 B1 F1 G1, then Tika will give you back a table with just 4 columns in, squashing the gaps.

Within POI, there is optional logic to detect these gaps, and generate dummy cells to let you know that something was missed. So, if we wanted, with not too much work we could detect and handle these

However, I'm not sure if that's something we should be doing or not? What do people think - should we be doing that level of processing before generating the SAX events, or would that be a step too far?

Nick

Reply via email to