Hi euler
This seems similar to
http://dspace.2283337.n4.nabble.com/Character-encoding-issues-in-Discovery-search-results-tp4675835p4675839.html
Perhaps it can help.
euler schreef op 08/06/15 om 15:00:
Dear All,
I am having issues with the text extraction of pdfs having non latin
characters
Dear All,
I am having issues with the text extraction of pdfs having non latin
characters and east asian languages. I tried switching to xpdf from pdfbox's
pdffilter but it is also not properly extracting the text from the pdf. If I
tried to extract the text from the pdf using the command line
Hi Antoine,
Thanks for the response. I did stumbled upon that thread when searching for
a solution. What I discovered was even though the extracted text is not
showing the proper characters when viewed from the browser, if I download
and open it in a text editor, it is showing the proper
3 matches
Mail list logo