[ 
https://issues.apache.org/jira/browse/TIKA-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Desmond updated TIKA-1880:
-------------------------------
    Description: 
When the ODS writer has first written, it made the assumption that the the 
`number-columns-repeated` attribute for cells would only be used for blank 
cells.  This is not the case with documents created by (at least) LibreOffice  
4.4.7.2.  The current work approach to repeated cells is to use the html 
concept of spanning, which is not suitable for repeated content.

The note in the Tika source (OpenDocumentContentParser.java#L459):

>TODO: The following is not correct, the cell should be repeated not spanned!
Code generates a HTML cell, spanning all repeated columns, to make the cell 
look correct.  Problems may occur when both spanning and repeating is given, 
which is not allowed by spec.  Cell spanning instead of repeating  is not a 
problem, because OpenOffice uses it only for empty cells.

  was:
When the ODS writer has first written, it made the assumption that the the 
`number-columns-repeated` attribute for cells would only be used for blank 
cells.  This is not the case with documents created by (at least) LibreOffice  
4.4.7.2.  The current work approach to repeated cells is to use the html 
concept of spanning, which is not suitable for repeated content.

The note in the Tika source (OpenDocumentContentParser.java#L459):

TODO: The following is not correct, the cell should be repeated not spanned!
Code generates a HTML cell, spanning all repeated columns, to make the cell 
look correct.  Problems may occur when both spanning and repeating is given, 
which is not allowed by spec.  Cell spanning instead of repeating  is not a 
problem, because OpenOffice uses it only for empty cells.


> Tag for number-columns-repeated not correctly used in ODS documents
> -------------------------------------------------------------------
>
>                 Key: TIKA-1880
>                 URL: https://issues.apache.org/jira/browse/TIKA-1880
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.12
>            Reporter: Ryan Desmond
>            Priority: Minor
>              Labels: LibreOffice
>
> When the ODS writer has first written, it made the assumption that the the 
> `number-columns-repeated` attribute for cells would only be used for blank 
> cells.  This is not the case with documents created by (at least) LibreOffice 
>  4.4.7.2.  The current work approach to repeated cells is to use the html 
> concept of spanning, which is not suitable for repeated content.
> The note in the Tika source (OpenDocumentContentParser.java#L459):
> >TODO: The following is not correct, the cell should be repeated not spanned!
> Code generates a HTML cell, spanning all repeated columns, to make the cell 
> look correct.  Problems may occur when both spanning and repeating is given, 
> which is not allowed by spec.  Cell spanning instead of repeating  is not a 
> problem, because OpenOffice uses it only for empty cells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to