Re: Unicode codepoint conversions

sahy...@fileaffairs.de Wed, 18 Nov 2020 05:57:51 -0800


Am Mittwoch, den 18.11.2020, 13:58 +0200 schrieb Constantine Dokolas:
> I noticed that writing some codepoints to a PDF and then reading back
> the
> text from the generated PDF (via PDFTextStripper), I see some
> conversions
> happening. For example, the simple hyphen character (0x2D, "HYPHEN-
> MINUS")
> gets converted to a non-breaking hyphen (0x2011, "NON-BREAKING
> HYPHEN").
> 
> Since I'm writing unit tests to verify that everything gets written
> correctly in the PDF from my end (PDF generation), I need to know
> why, when
> and how these conversions take place (I first noticed them while
> writing
> some CJK codepoints). Any suggestions/pointers?
>


Could you share a code snippet how you are writing/retrieving the data.

BR
Maruan 

> Constantine
> --
> There is a computer disease that anybody who works with computers
> knows
> about. It's a very serious disease and it interferes completely with
> the
> work. The trouble with computers is that you 'play' with them!
> - Richard P. Feynman



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Re: Unicode codepoint conversions

Reply via email to