[ 
https://issues.apache.org/jira/browse/TIKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362177#comment-14362177
 ] 

Tyler Palsulich commented on TIKA-1144:
---------------------------------------

Sorry for letting this one fall off, too, [~kildishev]! Can someone familiar 
with Doc parsing take a look at this and TIKA-1140?

> Changes in styling mechanism, inner table support and list support for Word 
> Extractor
> -------------------------------------------------------------------------------------
>
>                 Key: TIKA-1144
>                 URL: https://issues.apache.org/jira/browse/TIKA-1144
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Denis Kildishev
>            Priority: Minor
>         Attachments: word_style.patch
>
>
> Current version of Poi mechanisms can be used to support different kinds of 
> styling and list handling. For current moment, Tika supports for styling of 
> separate Character Runs, but this approach is not ideal and can lead to 
> visual glitches in a form of pseudo spaces. 
> Another option is lists. Information about them already can be obtained from 
> poi representation, but this mechanism is not used in current version of Word 
> Extractor.
> One of options that also can be solved now, is the problem of inner tables. 
> It is not clearly related to two problems before, but the solution of this 
> problem is based on the same mechanism as solution for previously listed 
> problems. As an example of wrong handling can be file with table that 
> includes another table in the first cell. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to