Per Tunedal <[email protected]> writes: > Hi Francis, > I really like the idea "Make a program which tests Apertium data files > for suspicious or unrecommended constructs (likely to be bugs). " For > someone like me it's very easy to make a minor mistake when editing > those bloody XML-files :-) It's quite easy to miss a quotation mark ( ") > or some other symbols (<>) that aren't all that important in ordinary > language. Or omitting some closing symbols at the right side of the > expression (/>).
You can catch those by typing "make", but you really should get a validating XML editor that tells you straight away. > One way of improved checking would be not to just have separate programs > like Jimmy O'Regan's lint-tool for tsx-files, but also make the make > script be more explicit about errors. Some helpful hints about common > errors. Print the offending line with explicit info. Or rather the > offending expression? This applies to make scripts for dictionaries as > well as for tagger training. Validation on make has gotten a bit better lately; Hrvoje added an XSD that prints some more info and catches some more bugs. But if you've got a good XML editor, most errors should be caught long before that. I notice some bugs get unhelpful line numbers from apertium-validate-dictionary, e.g. if you add some text between two <e>'s, the line number will point at the <section> tag which is often thousands of lines away – printing the offending line wouldn't help here. In Emacs I just hit "next error" and it goes straight to the line with the error and says "text not allowed here". There's a list of XML editors at http://wiki.apertium.org/wiki/XML_editors if you want to give any of them a go (if anyone has other options to recommend, please add them!). I think I'd recommend using XML Copy Editor if you're not already into emacs/vim, it does syntax highlighting and well-formedness checking out of the box, and setting up validation isn't hard. Unfortunately, none of the non-Emacs editors I've tried have the ability to go straight to the offending line on all the various DTD errors I've tried, but maybe I've just had bad luck. (From what I can tell, vim users tend to use xmllint, which is what apertium-validate-dictionary uses.) Perhaps we should make a dix-lint that just calls up emacs =P Unfortunately emacs-based validation is a bit slower than xmllint on big files :-) -- Kevin Brubeck Unhammer GPG: 0x766AC60C
signature.asc
Description: PGP signature
------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
