The last time I parsed spreadsheets with POI, I found a lot of functionality to render the layout of the spreadsheet in css. Does anyone think that that would be a worthy endeavor or feasible? I would love to become a committer to tika.
On Tue, Oct 8, 2013 at 9:14 AM, Nick Burch <[email protected]> wrote: > Hi All > > The Excel file formats (.xls and .xlsx) are somewhat sparse formats, and > where a cell has never been used it generally doesn't get written to the > file. (Being a Microsoft format, there are exceptions to this...). > Currently, if you parse a file with cells at A1 B1 F1 G1, then Tika will > give you back a table with just 4 columns in, squashing the gaps. > > Within POI, there is optional logic to detect these gaps, and generate > dummy cells to let you know that something was missed. So, if we wanted, > with not too much work we could detect and handle these > > However, I'm not sure if that's something we should be doing or not? What > do people think - should we be doing that level of processing before > generating the SAX events, or would that be a step too far? > > Nick >
