Per Tunedal <[email protected]>
writes:

> Hi Francis,
> I really like the idea "Make a program which tests Apertium data files
> for suspicious or unrecommended constructs (likely to be bugs). " For
> someone like me it's very easy to make a minor mistake when editing
> those bloody XML-files :-) It's quite easy to miss a quotation mark ( ")
> or some other symbols (<>) that aren't all that important in ordinary
> language. Or omitting some closing symbols at the right side of the
> expression (/>).

You can catch those by typing "make", but you really should get a
validating XML editor that tells you straight away.

> One way of improved checking would be not to just have separate programs
> like Jimmy O'Regan's lint-tool for tsx-files, but also make the make
> script be more explicit about errors. Some helpful hints about common
> errors. Print the offending line with explicit info. Or rather the
> offending expression? This applies to make scripts for dictionaries as
> well as for tagger training.

Validation on make has gotten a bit better lately; Hrvoje added an XSD
that prints some more info and catches some more bugs. But if you've got
a good XML editor, most errors should be caught long before that.

I notice some bugs get unhelpful line numbers from
apertium-validate-dictionary, e.g. if you add some text between two
<e>'s, the line number will point at the <section> tag which is often
thousands of lines away – printing the offending line wouldn't help
here.

In Emacs I just hit "next error" and it goes straight to the line with
the error and says "text not allowed here". There's a list of XML
editors at http://wiki.apertium.org/wiki/XML_editors if you want to give
any of them a go (if anyone has other options to recommend, please add
them!). I think I'd recommend using XML Copy Editor if you're not
already into emacs/vim, it does syntax highlighting and well-formedness
checking out of the box, and setting up validation isn't hard.

Unfortunately, none of the non-Emacs editors I've tried have the ability
to go straight to the offending line on all the various DTD errors I've
tried, but maybe I've just had bad luck. (From what I can tell, vim
users tend to use xmllint, which is what apertium-validate-dictionary
uses.)

Perhaps we should make a dix-lint that just calls up emacs =P
Unfortunately emacs-based validation is a bit slower than xmllint on big
files :-)


-- 
Kevin Brubeck Unhammer

GPG: 0x766AC60C

Attachment: signature.asc
Description: PGP signature

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to