Unicode codepoint conversions

Constantine Dokolas Wed, 18 Nov 2020 03:59:00 -0800

I noticed that writing some codepoints to a PDF and then reading back the
text from the generated PDF (via PDFTextStripper), I see some conversions
happening. For example, the simple hyphen character (0x2D, "HYPHEN-MINUS")
gets converted to a non-breaking hyphen (0x2011, "NON-BREAKING HYPHEN").


Since I'm writing unit tests to verify that everything gets written
correctly in the PDF from my end (PDF generation), I need to know why, when
and how these conversions take place (I first noticed them while writing
some CJK codepoints). Any suggestions/pointers?

Constantine
--
There is a computer disease that anybody who works with computers knows
about. It's a very serious disease and it interferes completely with the
work. The trouble with computers is that you 'play' with them!
- Richard P. Feynman

Unicode codepoint conversions

Reply via email to