Package: xpdf-utils
Version: 3.01-9
X-debbugs-cc: [EMAIL PROTECTED]
Severity: normal
File: /usr/bin/pdftotext

What's the deal with the repeated characters?

wget http://www.x-net.idv.tw/download/swlfreq/CHNB06.pdf
pdftotext -layout -enc Big5 CHNB06.pdf
Error: No paper information available - using defaults
iconv -f big5 CHNB06.txt|grep 遭
B.B.C. ( 英中廣播漢台 )* 遭大陸遭遭遭遭 B06
V.O.A.   ( 美中中台 )* 遭大陸遭遭遭遭 B06
R.FREE   ASIA ( 自自亞洲漢伊 ) *遭大陸遭遭遭遭 B06
註:RTI遭遭中中大陸遭遭遭遭, 請請請接請請時停請請文冬!

Any why the wide characters when the look narrow in xpdf?

Phew, shook of the wides with
perl -C -pwMText::Unidecode -e 's/\P{Han}+/unidecode($&)/eg'


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to