Re: [lingu-dev] Convert and analyse personal dictionaries

Santiago Bosio Wed, 23 Jul 2008 05:31:04 -0700

Michel Weimerskirch escribió:
> Hi
>
> Personal dictionaries (stored in "standard.dic") seem to be in a
> binary format. Is there a tool that can convert them to text-files in
> order to process them?
>
> A large Luxembourg-based company is interested in deploying OOo on a
> number of machines in order to use the Luxembourgish spellchecking
> dictionary I developed. They offered to regularly send me their
> personal dictionaries with the words that the spellchecker doesn't
> recognise yet, so I need a means to streamline the process of
> converting and analysing those.
>   
Michel:


I use this simple C program, that does the trick.
It seems first 11 bytes are some kind of header, probably stating the
language locale and so. After that, a word (2-bytes) indicate how much
characters has the next word, and then the word in UTF8, this structure
count-word repeats for each word in the dictionary.

Here's the C simple code:

*//*
 * extraer.c: Extrae el listado de palabras de un diccionario
 *            personal .dic de OpenOffice.org.
 *
 * Para compilar el programa ejecute: "gcc -o extraer extraer.c"
 *
 * Utilización: "extraer < fichero.dic > listado.txt"
 *
 * (c) 2005, Santiago Bosio.
 * Este programa se distribuye bajo licencia GNU GPL.
 *
 */

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
    int largo = 0;
    unsigned char palabra[100];

    /* Ignorar los primeros once bytes: encabezado */
    if ( fread (palabra, sizeof(unsigned char), 11, stdin) < 11 )
    {
        fprintf (stderr, "Error: No es un diccionario válido.\n");
        exit (1);
    }

    if ( fread (&largo, 2, 1, stdin) <= 0 )
    {
        fprintf (stderr, "El diccionario no contiene palabras.\n");
        exit (1);
    }

    while ( !feof (stdin) )
    {
        if ( largo > 100 ) /* Saltear las palabras largas (errores) */
        {
            fprintf (stderr, "Error: palabra demasiado larga.\n");
            fseek (stdin, (long) largo, SEEK_CUR);
        }
        else
        {
            fread (palabra, sizeof(unsigned char), (size_t) largo, stdin);
            fwrite (palabra, sizeof(unsigned char), (size_t) largo, stdout);
            fprintf (stdout, "\n");
        }
        fread (&largo, 2, 1, stdin);
    }

    return (0);
}
/*
Hope this helps you. Best regards,

Santiago.

Re: [lingu-dev] Convert and analyse personal dictionaries

Reply via email to