Hi All
The Excel file formats (.xls and .xlsx) are somewhat sparse formats, and
where a cell has never been used it generally doesn't get written to the
file. (Being a Microsoft format, there are exceptions to this...).
Currently, if you parse a file with cells at A1 B1 F1 G1, then Tika will
give you back a table with just 4 columns in, squashing the gaps.
Within POI, there is optional logic to detect these gaps, and generate
dummy cells to let you know that something was missed. So, if we wanted,
with not too much work we could detect and handle these
However, I'm not sure if that's something we should be doing or not? What
do people think - should we be doing that level of processing before
generating the SAX events, or would that be a step too far?
Nick
- Excel files with "holes" in the cell sequence Nick Burch
-