[ 
https://issues.apache.org/jira/browse/TIKA-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346025#comment-14346025
 ] 

Tyler Palsulich commented on TIKA-1020:
---------------------------------------

Have we changed our mind on this issue? Do we want to track empty cells in 
excel sheets? The above is an argument to support it, but what about when there 
is one row with data, 100 without, and another with data?

> Excel 2010 parser missing cell values are not reported resulting in missing 
> columns values
> ------------------------------------------------------------------------------------------
>
>                 Key: TIKA-1020
>                 URL: https://issues.apache.org/jira/browse/TIKA-1020
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2
>         Environment: java 1.6 & 1.7 
>            Reporter: Neil Blue
>              Labels: newbie, patch
>
> When parting an excel 2010 table, if a worksheet has a missing value, then it 
> is not reported in the sax handler. As a result a missing value can result in 
> unordered data.
> For example given the table:
> {code:title=Bar.java|borderStyle=solid}
> A B B
> 1 2 3
> 4   6
> 7 8 9
> {code}
> the returned sax handler reports elements
> {code:title=Bar.java|borderStyle=solid}
> <tr><td>A</td><td>B</td><td>C</td><tr>
> <tr><td>1</td><td>2</td><td>3</td><tr>
> <tr><td>4</td><td>6</td><tr>
> <tr><td>7</td><td>8</td><td>9</td><tr>
> {code}
> As a result the handler can detect that the third row as incomplete cell 
> values but it is ambiguous which columns have missing data.
> As a possible fix for this excel 2010 xml data contains the cell reference 
> value, which could be returned to the sax handler as an attribute. 
> {code:title=Bar.java|borderStyle=solid}
> *** XSSFExcelExtractorDecorator.java    2012-11-08 10:51:55.881207100 +0000
> --- XSSFExcelExtractorDecorator.java.1  2012-11-08 10:59:02.972223700 +0000
> ***************
> *** 200,206 ****
>   
>          public void cell(String cellRef, String formattedValue) {
>             try {
> !              xhtml.startElement("td");
>   
>                // Main cell contents
>                xhtml.characters(formattedValue);
> --- 200,208 ----
>   
>          public void cell(String cellRef, String formattedValue) {
>             try {
> !              AttributesImpl attributes = new AttributesImpl();
> !              attributes.addAttribute(null, "cellRef", "cellRef", null, 
> cellRef) ;
> !              xhtml.startElement("td",attributes);
>   
>                // Main cell contents
>                xhtml.characters(formattedValue);
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to