[
https://issues.apache.org/jira/browse/TIKA-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Desmond updated TIKA-1880:
-------------------------------
Description:
When the ODS writer has first written, it made the assumption that the the
`number-columns-repeated` attribute for cells would only be used for blank
cells. This is not the case with documents created by (at least) LibreOffice
4.4.7.2. The current work approach to repeated cells is to use the html
concept of spanning, which is not suitable for repeated content.
The note in the Tika source (OpenDocumentContentParser.java#L459):
TODO: The following is not correct, the cell should be repeated not spanned!
Code generates a HTML cell, spanning all repeated columns, to make the cell
look correct. Problems may occur when both spanning and repeating is given,
which is not allowed by spec. Cell spanning instead of repeating is not a
problem, because OpenOffice uses it only for empty cells.
was:
When the ODS writer has first written, it made the assumption that the the
`number-columns-repeated` attribute for cells would only be used for blank
cells. This is not the case with documents created by (at least) LibreOffice
4.4.7.2.
The note in the Tika source (OpenDocumentContentParser.java#L459):
TODO: The following is not correct, the cell should be repeated not spanned!
* Code generates a HTML cell, spanning all repeated columns, to make the cell
look correct.
* Problems may occur when both spanning and repeating is given, which is not
allowed by spec.
* Cell spanning instead of repeating is not a problem, because OpenOffice
uses it
* only for empty cells.
*
> Tag for number-columns-repeated not correctly used in ODS documents
> -------------------------------------------------------------------
>
> Key: TIKA-1880
> URL: https://issues.apache.org/jira/browse/TIKA-1880
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.12
> Reporter: Ryan Desmond
> Priority: Minor
> Labels: LibreOffice
>
> When the ODS writer has first written, it made the assumption that the the
> `number-columns-repeated` attribute for cells would only be used for blank
> cells. This is not the case with documents created by (at least) LibreOffice
> 4.4.7.2. The current work approach to repeated cells is to use the html
> concept of spanning, which is not suitable for repeated content.
> The note in the Tika source (OpenDocumentContentParser.java#L459):
> TODO: The following is not correct, the cell should be repeated not spanned!
> Code generates a HTML cell, spanning all repeated columns, to make the cell
> look correct. Problems may occur when both spanning and repeating is given,
> which is not allowed by spec. Cell spanning instead of repeating is not a
> problem, because OpenOffice uses it only for empty cells.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)