Akash Sudhakar commented on TIKA-2077:

Thanks Tim for your comments.

 Not sure from where those characters are coming. Is there a way we can remove 
repeating characters during text extraction in apache tika. That we can remove 
these text extracted.

> Special character extracted as AAAAAAAA in docx file extraction
> ---------------------------------------------------------------
>                 Key: TIKA-2077
>                 URL: https://issues.apache.org/jira/browse/TIKA-2077
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Akash Sudhakar
>         Attachments: TestData.docx
> During docx file extraction using tika 1.13, special character is extracted 
> How to avoid this.

This message was sent by Atlassian JIRA

Reply via email to