On 12 November 2011 22:11, Francis Tyers <[email protected]> wrote:
> El ds 12 de 11 de 2011 a les 21:15 +0000, en/na Kevin Donnelly va
>> (c) Instead of devising an interface to the current format, devise upstream
>> tools for populating a grid format.
>> --------------------------------------------------
>
> What I think here is that we can make the grid format a way of coding
> data that will later be converted into lttoolbox format. Really, no-one
> should be editting XML, it should be just an intermediate layer between
> the grid, and the binary format.

I'm surprised that Fran didn't mention that we already have what is
effectively a grid format for generating monolingual dictionaries, via
the speling format.

$ cat test.txt
house; house; sg; n;
house; houses; pl; n;
index; index; sg; n;
index; indices; pl; n;
index; indexes; pl; n;

$ java -jar apertium-dixtools.jar speling -standard test.txt out.dix
Lemma: house
Lemma: house
Lemma: index
Lemma: index
Lemma: index
house
index
Writing file out.dix

$cat out.dix

<?xml version="1.0" encoding="UTF-8"?>
<!--
        Dictionary:
        Sections: 1
        Entries: 2
        Sdefs: 3
        Paradigms: 2
        Last processed by: apertium-dixtools speling -standard test.txt out.dix

-->
<dictionary>
  <alphabet>hHoOuUsSeEiInNdDxXcC</alphabet>
  <sdefs>
    <sdef n="n" />
    <sdef n="sg" />
    <sdef n="pl" />
  </sdefs>
  <pardefs>
    <pardef n="house__n">
    <e>
      <p>
        <l></l>
        <r><s n="n"/><s n="sg"/></r>
      </p>
    </e>

    <e>
      <p>
        <l>s</l>
        <r><s n="n"/><s n="pl"/></r>
      </p>
    </e>

    </pardef>
    <pardef n="ind/ex__n">
    <e>
      <p>
        <l>ex</l>
        <r>ex<s n="n"/><s n="sg"/></r>
      </p>
    </e>

    <e>
      <p>
        <l>ices</l>
        <r>ex<s n="n"/><s n="pl"/></r>
      </p>
    </e>

    <e>
      <p>
        <l>ices</l>
        <r>ex<s n="n"/><s n="pl"/></r>
      </p>
    </e>

    <e r="LR">
      <p>
        <l>exes</l>
        <r>ex<s n="n"/><s n="pl"/></r>
      </p>
    </e>

    </pardef>
  </pardefs>

  <section id="main" type="standard">
    <e lm="house"><i>house</i><par n="house__n"/></e>
    <e lm="index"><i>ind</i><par n="ind/ex__n"/></e>
  </section>
</dictionary>

There is also a Python version, but I mention the Java version because
1) it creates valid dictionaries, by inferring <sdefs> and <alphabet>
from the input; 2) it inserts direction restrictions for alternate
forms, based on order of declaration (and 3) I wrote it :)

I had started work on a bilingual version, that took input in the form
"word; paradigm; word; paradigm", and was to update all three
dictionaries at once, but ran into a problem in getting proper bidix
entries from the set of tags in the paradigms (it worked most of the
time, but not *all* the time).

It would be trivial to have a format like "word; tags; word; tags" for
updating just the bidix, though.


-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to