[
https://issues.apache.org/jira/browse/TIKA-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362555#comment-14362555
]
Tyler Palsulich commented on TIKA-1140:
---------------------------------------
Two possible related issues.
> Better table representation, cell spanning in Word Extractor
> ------------------------------------------------------------
>
> Key: TIKA-1140
> URL: https://issues.apache.org/jira/browse/TIKA-1140
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Denis Kildishev
> Priority: Minor
> Attachments: word_table.patch
>
>
> As for current version of Word Extractor, it have access to different
> features of tables, but most of them are not used. As an example of possible
> improvements, may be support for borders, fixed cell widths and cell spanning.
> It should be noted that some of that features are already used in poi version
> of Html converted, so, that code can be reused in Tika.
> As an example of possible solution may be patch linked as an attachment. It
> have some code that is based on 2007 version of doc format
> specification(especially, Border type and color detection), so, different
> improvements tends to be made to meet with older formats.
> Patch already includes some changes in unit tests, that are required in
> accordance with changes in document structure.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)