[
https://issues.apache.org/jira/browse/TIKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362177#comment-14362177
]
Tyler Palsulich commented on TIKA-1144:
---------------------------------------
Sorry for letting this one fall off, too, [~kildishev]! Can someone familiar
with Doc parsing take a look at this and TIKA-1140?
> Changes in styling mechanism, inner table support and list support for Word
> Extractor
> -------------------------------------------------------------------------------------
>
> Key: TIKA-1144
> URL: https://issues.apache.org/jira/browse/TIKA-1144
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Denis Kildishev
> Priority: Minor
> Attachments: word_style.patch
>
>
> Current version of Poi mechanisms can be used to support different kinds of
> styling and list handling. For current moment, Tika supports for styling of
> separate Character Runs, but this approach is not ideal and can lead to
> visual glitches in a form of pseudo spaces.
> Another option is lists. Information about them already can be obtained from
> poi representation, but this mechanism is not used in current version of Word
> Extractor.
> One of options that also can be solved now, is the problem of inner tables.
> It is not clearly related to two problems before, but the solution of this
> problem is based on the same mechanism as solution for previously listed
> problems. As an example of wrong handling can be file with table that
> includes another table in the first cell.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)