Let's say we have a PDF with a bunch of custom encodings. they would look like this in your Font Properties:
[image: image.png] Notice those with "encoding: custom". So even though the PDF has normal looking hebrew text such as: [image: image.png] When you copy it to clipboard it looks like this: ©°³ ž ³ž©¤³ That's because the custom encoding does not actually map to UTF-8 characters. Has anyone heard of a way to magically process these custom encodings to find a reasonable UTF-8 mapping? I'm not even sure how that would be possible, but I figured I'd just reach out and see how ya'll out there in the wild have handled custom encodings. In particular, i want to index my PDFs into Solr but doing so is completely useless because the custom encodings index as complete gibberish. Any ideas? -Nicholas
