[jira] [Updated] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction

Akash Sudhakar (JIRA) Mon, 12 Sep 2016 01:13:35 -0700

     [ 
https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Akash Sudhakar updated TIKA-2077:
---------------------------------
    Attachment: TestData.docx

Attached test file.
Below is the code used.
    BodyContentHandler handler = new BodyContentHandler();
    AutoDetectParser parser = new AutoDetectParser();
    Metadata metadata = new Metadata();
    InputStream stream = new BufferedInputStream(new FileInputStream(file));
    parser.parse(stream, handler, metadata);

> Special character extracted as AAAAAAAA in docx file extraction
> ---------------------------------------------------------------
>
>                 Key: TIKA-2077
>                 URL: https://issues.apache.org/jira/browse/TIKA-2077
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Akash Sudhakar
>            Priority: Minor
>         Attachments: TestData.docx
>
>
> During docx file extraction using tika 1.13, special character is extracted 
> as AAAAAAAA.
> How to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction

Reply via email to