Hey Francis,
Have you considered adding the corpus-based paradigm matching idea you'd
had as a project? I'm not entirely sure if the whole thing would be complex
enough to justify making it part of GSoC, but I can't really be too sure.
(I suppose) it could be built upon, and potentially be made pretty useful
(and accurate).
Cheers,
Vinit
On 12 February 2015 at 16:23, Kevin Brubeck Unhammer <[email protected]>
wrote:
> Per Tunedal <[email protected]>
> writes:
>
> > Hi Francis,
> > I really like the idea "Make a program which tests Apertium data files
> > for suspicious or unrecommended constructs (likely to be bugs). " For
> > someone like me it's very easy to make a minor mistake when editing
> > those bloody XML-files :-) It's quite easy to miss a quotation mark ( ")
> > or some other symbols (<>) that aren't all that important in ordinary
> > language. Or omitting some closing symbols at the right side of the
> > expression (/>).
>
> You can catch those by typing "make", but you really should get a
> validating XML editor that tells you straight away.
>
> > One way of improved checking would be not to just have separate programs
> > like Jimmy O'Regan's lint-tool for tsx-files, but also make the make
> > script be more explicit about errors. Some helpful hints about common
> > errors. Print the offending line with explicit info. Or rather the
> > offending expression? This applies to make scripts for dictionaries as
> > well as for tagger training.
>
> Validation on make has gotten a bit better lately; Hrvoje added an XSD
> that prints some more info and catches some more bugs. But if you've got
> a good XML editor, most errors should be caught long before that.
>
> I notice some bugs get unhelpful line numbers from
> apertium-validate-dictionary, e.g. if you add some text between two
> <e>'s, the line number will point at the <section> tag which is often
> thousands of lines away – printing the offending line wouldn't help
> here.
>
> In Emacs I just hit "next error" and it goes straight to the line with
> the error and says "text not allowed here". There's a list of XML
> editors at http://wiki.apertium.org/wiki/XML_editors if you want to give
> any of them a go (if anyone has other options to recommend, please add
> them!). I think I'd recommend using XML Copy Editor if you're not
> already into emacs/vim, it does syntax highlighting and well-formedness
> checking out of the box, and setting up validation isn't hard.
>
> Unfortunately, none of the non-Emacs editors I've tried have the ability
> to go straight to the offending line on all the various DTD errors I've
> tried, but maybe I've just had bad luck. (From what I can tell, vim
> users tend to use xmllint, which is what apertium-validate-dictionary
> uses.)
>
> Perhaps we should make a dix-lint that just calls up emacs =P
> Unfortunately emacs-based validation is a bit slower than xmllint on big
> files :-)
>
>
> --
> Kevin Brubeck Unhammer
>
> GPG: 0x766AC60C
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff