[
https://issues.apache.org/jira/browse/TIKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Kildishev updated TIKA-1144:
----------------------------------
Attachment: (was: word_style.patch)
> Changes in styling mechanism, inner table support and list support for Word
> Extractor
> -------------------------------------------------------------------------------------
>
> Key: TIKA-1144
> URL: https://issues.apache.org/jira/browse/TIKA-1144
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Denis Kildishev
> Priority: Minor
> Attachments: word_style.patch
>
>
> Current version of Poi mechanisms can be used to support different kinds of
> styling and list handling. For current moment, Tika supports for styling of
> separate Character Runs, but this approach is not ideal and can lead to
> visual glitches in a form of pseudo spaces.
> Another option is lists. Information about them already can be obtained from
> poi representation, but this mechanism is not used in current version of Word
> Extractor.
> One of options that also can be solved now, is the problem of inner tables.
> It is not clearly related to two problems before, but the solution of this
> problem is based on the same mechanism as solution for previously listed
> problems. As an example of wrong handling can be file with table that
> includes another table in the first cell.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira