Re: [Apertium-stuff] Lint Checker Ideas for GSOC

Aaron Rubin Mon, 26 Mar 2012 14:18:01 -0700

I've adjusted the plan quite a bit - it now gives more time to transfer
rules and checks for a few other problems that I thought might come up. How
does it look?

Weeks 1-5, .dix files:
Week 1: Redundant Entry Finder
Week 2: Testing Full Entries in Lemmas where Part of the Lemma is Specified
by the Pardef; Testing Misspelled Tags and Pardefs
Week 3: Testing Incompatible Tags; Testing Tag Missing on One Side of
Translation Equivalents (in bilingual dictionaries)
Week 4: Testing Missing Gender on Gendered Languages (in bilingual
dictionaries)
Week 5: Bundling features together in one program; re-organizing code, and
writing documentation, to make sure that everything is as neat and
maintainable as possible. Combining tests from previous weeks into a single
testing program so that all features can be tested at once when the code is
modified in the future.
Weeks 6-12, transfer rules:
Week 6:  Checking inappropriate uses of <equal>, <begins-with>,
<ends-with>, and <let> in transfer rules (equating a tag with a non-empty
string literal, etc.)
Week 7: Checking for cases where the user asks for nonexistent tags with
lit-tag v="some_tag" (always an error) or for a string literal with lit
v="some_string" that is identical to a tag (suspicious and very likely an
error).
Week 8: Checking for undefined tags after attr-item in attribute
definitions, probably due to spelling errors. Checking for calls to
anything other than a defined attribute, lem, lemh, lemq, whole, or tags
after part= in a clip.
Week 9: Checking for patterns that refer to non-existent categories,
probably due to spelling errors. Checking for misspelled variables.
Week 10: Checking for an untagged chunk (ex., in the rule "HACE NUM NOM" in
apertium-en-es.en-es.t1x, forgetting to give the resulting chunk the tag
"adverb," which seems like a conceivable mistake to me). Checking for
incorrect number of arguments in calls to macro.
Week 11: Checking for missing <test> after <when> and for non-boolean
arguments to <test>, <and>, <not>, and <or> (unless the compiler already
checks for that sort of thing?). Testing missing lemma queue after lemma
head.
Week 12: Bundling all features together into one program (note that this
program would need to take a suitable dictionary file, in addition to a
transfer rules file, as input to determine the set of valid tags).
Re-organizing code, and writing documentation, to make sure that everything
is as neat and maintainable as possible. Combining tests from previous
weeks into a single testing program so that all features can be tested at
once when the code is modified in the future.

-Aaron

On Fri, Mar 23, 2012 at 4:22 AM, Francis Tyers <[email protected]> wrote:

> El dv 23 de 03 de 2012 a les 10:27 +0100, en/na Jacob Nordfalk va
> escriure:
> >
> >
> > 2012/3/23 Francis Tyers <[email protected]>
> >         El dj 22 de 03 de 2012 a les 20:33 -0400, en/na Aaron Rubin va
> >         escriure:
> >         > Thanks for the suggestions, everyone! This is my tentative
> >         schedule,
> >         > as of now:
> >         >
> >         > Weeks 1-7, .dix files:
> >         > Week 1: Redundant Entry Finder
> >         > Week 2: Testing Full Entries in Lemmas where Part of the
> >         Lemma is
> >         > Specified by the Pardef
> >         > Week 3: Testing Misspelled Tags and Pardefs
> >         > Week 4: Testing Incompatible Tags
> >         > Week 5: Testing Tag Missing on One Side of Translation
> >         Equivalents
> >         > Week 6: Testing Missing Gender on Gendered Languages
> >         > Week 7: Bundling all of these features together in one
> >         program;
> >         > testing.
> >         > Weeks 8-10, Transfer rules:
> >         > Week 8:  Checking inappropriate uses of <equal>,
> >         <begins-with>,
> >         > <ends-with>, and <let> in transfer rules. Perhaps contains
> >         substring
> >         > (<cmp substr>) and <in> as well? I'm having a bit of trouble
> >         figuring
> >         > out where and why those two are used.. if someone could
> >         point me to a
> >         > tutorial page with an illustrative example, I'd appreciate
> >         it. The
> >         > same for <begins-with> and <ends-with>, for that matter.
> >         > Week 9: Checking for cases where the user asks for
> >         nonexistent tags.
> >         > Week 10: Checking for incorrect number of arguments in calls
> >         to macro
> >         > (Weeks 9 and 10 will probably take less than a week, but
> >         Week 8's task
> >         > might be intricate enough to compensate)
> >         > Week 11-12: Bundling all features together into one program.
> >         Possibly
> >         > combining with .dix files checker, with a feature to check
> >         which type
> >         > of file is being input. Writing and running tests (adding
> >         deliberate
> >         > errors to sample .dix and transfer rules files to see
> >         whether the
> >         > program catches them). Writing documentation to ensure that
> >         code is
> >         > maintainable.
> >
> >
> >         It seems that the plan is time skewed in favour of .dix files
> >         (imho the
> >         easier task). If anything I would say that 7 weeks on transfer
> >         and 2
> >         weeks on dictionaries seems more sensible.
> >
> >         I think that it might be a good idea to go through the
> >         language pair
> >         HOWTO, and see what kind of errors/pitfalls you come across
> >         that aren't
> >         handled by the validation programs.
> >
> >
> > I'd also suggest that you set time aside to get into the problem
> > domain  if you haven't already done so:
> >
> >
> > 1) work a little on a pair
> > 2) interview/observe someone who has just started working on a pair
> >
> >
> > Writing a system which is supposed to help others, especially
> > beginners, will be better written by someone who has experienced the
> > obstacles themselves.
>
> Excellent advice!
>
> Fran
>
>
>

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Lint Checker Ideas for GSOC

Reply via email to