Re: [Apertium-stuff] Lint Checker Ideas for GSOC

Kevin Brubeck Unhammer Thu, 22 Mar 2012 00:14:38 -0700

Aaron Rubin <[email protected]>
writes:

> Hi all,
>
> I've spoken with Francis a few times over e-mail and been on IRC a bit, but I 
> don't
> think I've introduced myself to the whole listhost. I'm a third-year student 
> at the
> University of Chicago, majoring in linguistics with a minor in Comp Sci. Most 
> of my
> programming experience is doing various analyses of text files in C, so it 
> seemed that
> of all the project ideas, the lint tester for suspicious constructs in .dix 
> files would
> be the best for me (I thought about proposing a Japanese-English language 
> pair, but
> Google does a fairly OK job with Japanese as it is, and there's no way I 
> could surpass
> that in three months). I've already written a duplicate tag checker in C and 
> sent it out
> to the listhost earlier today, and I've been thinking about how I'd implement 
> some of
> the other suggestions on the lint tester ideas page, as well as a few ideas 
> of my own.
> The problem, though, is that I'm not sure how I'd be able to fill up the 
> whole summer
> doing it! This is my tentative schedule:
>
> Week 1: Redundant Entry Finder
> Week 2: Testing Full Entries in Lemmas where Part of the Lemma is Specified 
> by the
> Pardef
> Week 3: Testing Misspelled Tags and Pardefs
> Week 4: Testing Incompatible Tags (multiple gender tags instead of combined 
> tags for
> nouns of ambiguous gender, multiple number tags, a "noun" and "adj" tag on 
> the same
> entry)
> Week 5: Testing Tag Missing on One Side of Translation Equivalents (a "noun" 
> tag on the
> English side, but not on the Spanish side)
> Week 6: Testing Missing Gender on Gendered Languages (this would be an 
> intricate one...
> I'd have to investigate which of the languages in the language pairs have 
> gender or noun
> class systems and have the program take that into account)
>
> But not all of those would necessarily take up a week, and there's no way 
> that all of
> this will take 12 weeks! So I've been thinking about common errors that might 
> show up in
> transfer rules files, but nothing's really come to mind. Has anyone else 
> noticed common
> mistakes in .dix or transfer rules files that would be suitable for this kind 
> of program
> to look for?


Say you're editing a transfer file that has 

    <def-attr n="a_det">
      <attr-item tags="det"/> 
      <attr-item tags="det.emph"/>
      <attr-item tags="det.dem"/>
      <attr-item tags="det.itg"/>
      <attr-item tags="det.qnt"/>
      <attr-item tags="det.pos"/>
    </def-attr>
    …
    <not>
     <equal>
      <clip pos="1" side="tl" part="a_det"/>   <lit v=""/>
     </equal>
    </not>

(ie. it's not a determiner at all) and you want to make it a more
specific requirement, like "it has to be the tag sequence <det><pos>".
It's easy to leave out the -tag and write

<not>
 <equal>
  <clip pos="1" side="tl" part="a_det"/>   <lit v="det.pos"/>
 </equal>
</not>

where the correct version would be

<not>
 <equal>
  <clip pos="1" side="tl" part="a_det"/>   <lit-tag v="det.pos"/>
 </equal>
</not>

or to write det.poss or something, which would never match since it's
not defined in a_det. Here you could give a warning if the user tests
for a def-attr-defined clip being anything other than 1) empty, 2) a
tag/tag sequence from the def-attr, or 3) a variable. 

There are also default clips not defined in def-attr, like "lemh",
"lemq", "lem", that can contain empty or non-empty lit's, but never
tags.

I guess you could also do the same for <begins-with> instead of <equal>.


You could probably also warn about

    <in>
     <clip part="a_det"/>
     <list n="some-list-that-is-disjoint-from-a_det"/>
    </in>


And then there's calling a macro with the wrong amount of arguments; the
various vm for transfer compilers show this check, but the standard one
does not, so it wouldn't hurt to put it in.


-Kevin


------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lint Checker Ideas for GSOC

Reply via email to