[jira] Created: (TIKA-469) The Parser is not correctly outputting Arabic text documents

Robert Cullen (JIRA) Thu, 22 Jul 2010 08:00:48 -0700

The Parser is not correctly outputting Arabic text documents
------------------------------------------------------------


                 Key: TIKA-469
                 URL: https://issues.apache.org/jira/browse/TIKA-469
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.7
         Environment: Windows XP
            Reporter: Robert Cullen


The parser is not preserving the character encoding when parsing documents in 
Arabic UTF-8, specifically with .pdf and .doc.  The resulting character output 
is undechipherable or just question-mark symbols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (TIKA-469) The Parser is not correctly outputting Arabic text documents

Reply via email to