Re: Automatic encoding guessing

Marcin 'Qrczak' Kowalczyk Tue, 23 Oct 2001 13:57:29 -0700

Tue, 23 Oct 2001 18:40:27 +0100 (BST), Markus Kuhn <[EMAIL PROTECTED]> pisze:


>   - You can do a bit more with character and tuple frequency
>     analysis. You need for various languages (English, German,
>     French, C, Lisp) and their transliterations a library of
>     frequency tables for the various UCS characters/pairs,
>     and then you try all Something->UCS conversions
>     until you find the best match of the resulting histogram
>     with one in the library (read up on "index of coincidence"
>     [Friedman, ~1920] in introductory cryptanalysis textbooks
>     such as Stinson).

I've done this (using frequencies of single letters only). Always
worked in practice when I needed it.

The program at
<http://qrczak.ids.net.pl/programy/linux/konwert/konwert-1.8.tar.gz>
contains it (it's really old and rusty, haven't got time to polish it).

Usage: e.g.
    konwert any/pl-iso2
Currently supported languages are cs de el eo es fr he it pl pt ru sv,
each in a couple of encodings. For Latin-based scripts it makes use of
frequencies of only non-English letters of course.

-- 
 __("<  Marcin Kowalczyk * [EMAIL PROTECTED] http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Automatic encoding guessing

Reply via email to