[
https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993599#comment-15993599
]
Tim Allison edited comment on TIKA-2352 at 5/2/17 7:51 PM:
-----------------------------------------------------------
{noformat}00002FF0 0A 0C D0 08 0A 00 00 C8 00 06 00 00
..Ð....È....
00003000 0A 00 08 D0 D3 04 0A 00 01 00 01 00 00 00 0A 00 ...ÐÓ...........
00003010 04 D3 C3 0C C3 0A C1 E0 C1 10 EC 13 23 00 C1 20 .ÓÃ.Ã.ÁàÁ.ì.#.Á
00003020 C3 02 C3 Ã.Ã
{noformat}
then {{1. INTRODUCTION}}
It looks like {{C1 E0 C1}} is a complete {{C1}} skip, then {{EC}} is
interpreted as the start of a variable length multi-byte function of length
{{23}}; but from the text which appears in LibreOffice, {{EC}} should not be
interpreted as the start of a variable length function.
I wonder [~pascal.essiembre]...if {{C1...C1...C1}} were a valid skip pattern,
then {{EC}} would be enclosed in the skipped content, and we could resume with
{{C3 02 C3}} and then the text.
was (Author: [email protected]):
{noformat}00002FF0 0A 0C D0 08 0A 00 00 C8 00 06 00 00
..Ð....È....
00003000 0A 00 08 D0 D3 04 0A 00 01 00 01 00 00 00 0A 00 ...ÐÓ...........
00003010 04 D3 C3 0C C3 0A C1 E0 C1 10 EC 13 23 00 C1 20 .ÓÃ.Ã.ÁàÁ.ì.#.Á
00003020 C3 02 C3 Ã.Ã
{noformat}
It looks like {{C1 E0 C1}} is a complete {{C1}} skip, then {{EC}} is
interpreted as the start of a variable length multi-byte function of length
{{23}}, from the text, is not what it should be...
I wonder [~pascal.essiembre]...if {{C1...C1...C1}} were a valid skip pattern,
then {{EC}} would be enclosed in the skipped content, and we could resume with
{{C3 02 C3}} and then the text.
> Incorrect EOF exception in WordPerfect parser
> ---------------------------------------------
>
> Key: TIKA-2352
> URL: https://issues.apache.org/jira/browse/TIKA-2352
> Project: Tika
> Issue Type: Bug
> Reporter: Tim Allison
> Priority: Trivial
> Attachments: 462321.wp
>
>
> We have a few EOF exceptions in WordPerfect files that are likely not
> truncated. The example I'll attach shortly is able to be opened without
> complaint by LibreOffice.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)