[ https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508986#comment-15508986 ]
Akash Sudhakar commented on TIKA-2077: -------------------------------------- Thanks Tim for your comments. Not sure from where those characters are coming. Is there a way we can remove repeating characters during text extraction in apache tika. That we can remove these text extracted. > Special character extracted as AAAAAAAA in docx file extraction > --------------------------------------------------------------- > > Key: TIKA-2077 > URL: https://issues.apache.org/jira/browse/TIKA-2077 > Project: Tika > Issue Type: Bug > Affects Versions: 1.13 > Reporter: Akash Sudhakar > Attachments: TestData.docx > > > During docx file extraction using tika 1.13, special character is extracted > as AAAAAAAA. > How to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)