Control: retitle -1 licensecheck: misparses utf8-encoded files by default

Quoting Ximin Luo (2017-07-05 18:00:28)
> licensecheck seems to generate bad output for unicode files such as:
> 
> https://sources.debian.net/src/sagemath/7.6-2/sage/src/doc/ja/tutorial/tour_rings.rst
> 
> An example command line is:
> 
> $ licensecheck -l250 --deb-machine --merge-licenses 
> src/doc/ja/tutorial/tour_rings.rst
> 
> I get glyphs like <U+008D>ã<U+0081> suggesting that maybe it is 
> getting utf-8-encoded twice.

Licensecheck reads data as Latin1 by default.

Explicitly tell licensecheck to use (or more accurately first try) utf8:

  licensecheck -l250 --deb-machine --merge-licenses --encoding utf8 
tour_rings.rst

I agree that this is not optimal: Nowadays licensecheck should use utf8 
by default.  I am just not quite certain how to go about that - if ok to 
simply switch, or if I should make a mimor or major version bump when 
doing such change.

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

Attachment: signature.asc
Description: signature

Reply via email to