The last time I parsed spreadsheets with POI, I found a lot of
functionality to render the layout of the spreadsheet in css.  Does anyone
think that that would be a worthy endeavor or feasible?  I would love to
become a committer to tika.


On Tue, Oct 8, 2013 at 9:14 AM, Nick Burch <[email protected]> wrote:

> Hi All
>
> The Excel file formats (.xls and .xlsx) are somewhat sparse formats, and
> where a cell has never been used it generally doesn't get written to the
> file. (Being a Microsoft format, there are exceptions to this...).
> Currently, if you parse a file with cells at A1 B1 F1 G1, then Tika will
> give you back a table with just 4 columns in, squashing the gaps.
>
> Within POI, there is optional logic to detect these gaps, and generate
> dummy cells to let you know that something was missed. So, if we wanted,
> with not too much work we could detect and handle these
>
> However, I'm not sure if that's something we should be doing or not? What
> do people think - should we be doing that level of processing before
> generating the SAX events, or would that be a step too far?
>
> Nick
>

Reply via email to