[
https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Akash Sudhakar updated TIKA-2077:
---------------------------------
Attachment: TestData.docx
Attached test file.
Below is the code used.
BodyContentHandler handler = new BodyContentHandler();
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
InputStream stream = new BufferedInputStream(new FileInputStream(file));
parser.parse(stream, handler, metadata);
> Special character extracted as AAAAAAAA in docx file extraction
> ---------------------------------------------------------------
>
> Key: TIKA-2077
> URL: https://issues.apache.org/jira/browse/TIKA-2077
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.13
> Reporter: Akash Sudhakar
> Priority: Minor
> Attachments: TestData.docx
>
>
> During docx file extraction using tika 1.13, special character is extracted
> as AAAAAAAA.
> How to avoid this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)