[
https://issues.apache.org/jira/browse/TIKA-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560931#comment-13560931
]
Axel Dörfler commented on TIKA-1062:
------------------------------------
Indeed. I have added a few things from reading the RTF 1.9.1 specification, but
I did not manage to produce those fields in an RTF using MS-Word and
LibreOffice. I also did not manage to create real nested lists, and I did not
want to try to support things I could not test. It shouldn't harm to leave it
in there; GroupState.listLevel is currently pretty much unused as well, btw.
If someone can up with RTF test files that make use of more features, I'm
willing to try to improve the current solution, though.
The changes also have a downside, as the parser does not know whether or not
the intention is to put out text: lists don't look like lists anymore in text
only output that does not take the tags into account at all. This is why I
added the option to ignore the lists as before. Would be nice to set a generic
output target for a parser (structured or plain text), eventually.
> Add list detection to RTFParser
> -------------------------------
>
> Key: TIKA-1062
> URL: https://issues.apache.org/jira/browse/TIKA-1062
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Axel Dörfler
> Assignee: Michael McCandless
> Priority: Minor
> Labels: patch
> Attachments: testRTFListLibreOffice.rtf,
> testRTFListMicrosoftWord.rtf, tika-rtf-lists.patch
>
>
> RTF supports lists, and the parser could support those, too, using HTML
> <ul>/<ol>/<li> tags.
> I'm attaching a patch that implements basic support for Word 97 and newer
> lists. Nested lists are not supported correctly, yet, though, and a number of
> formatting options are ignored.
> I've also added test cases for this, and adapted existing tests where needed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira