Re: prototype of new rule editor
W dniu 2014-04-03 21:51, Daniel Naber pisze: On 2014-04-03 21:27, Marcin Miłkowski wrote: The URL box is overly sensitive to the format of URLs, and does not accept for example this one: http://poradnia.pwn.pl/lista.php?id=9687 Thanks, will be fixed with the next deployment (it was a Chrome-only issue). Another one: adding a second example (under Chrome) creates incorrect XML: example type='incorrect'Obiecano odmrożenie markerzablokowanych aktyw/marker reżimu. Nie tkniesz moich aktyw!/example example type='correct'Obiecano odmrożenie zablokowanych aktywów reżimu./example example type='incorrect'Nie tkniesz moich aktyw!/example Note that Nie tkniesz moich aktyw was added as a second example. Then it is concatenated to the 1st example, and there's no marker in the example either. Regards, Marcin -- ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
External rule files
Hi all, I added some preliminary, very sketchy support for adding (rather than replacing) rules to the existing rule base. This is a first step towards user-level rules. This is a bit rough on the edges as I had no idea how to subclass a class returned by: Language.getLanguageForShortName() Does anyone have an idea how to do this? Google didn't help ;) I also added a new (now unused) attribute to rules element but the idea is simple: If you have integrate=add, then rules will be added, rather than replace the existing ones (integrate=replace, the default value?). Maybe there's place for a third value, when you want to use the existing language with its tokenization, tagger and all, but you don't want to use its rules (integrate=replace_only_rules). I will add the code that supports the attributes as soon as we're clear which ones we need. Any ideas? I think this is related to STE term checker, which could probably benefit from the third option -- I'm not sure why Mike Unwalla doesn't want to use our disambiguation rules, however. But if someone needs a special scenario in which disambiguation rules are added, we could have a special .zip format with two files: grammar and disambiguation. This is however a bit more complex. Regards, Marcin -- ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: prototype of new rule editor
W dniu 2014-04-04 10:32, Marcin Miłkowski pisze: W dniu 2014-04-03 21:51, Daniel Naber pisze: On 2014-04-03 21:27, Marcin Miłkowski wrote: The URL box is overly sensitive to the format of URLs, and does not accept for example this one: http://poradnia.pwn.pl/lista.php?id=9687 Thanks, will be fixed with the next deployment (it was a Chrome-only issue). Another one: adding a second example (under Chrome) creates incorrect XML: example type='incorrect'Obiecano odmrożenie markerzablokowanych aktyw/marker reżimu. Nie tkniesz moich aktyw!/example example type='correct'Obiecano odmrożenie zablokowanych aktywów reżimu./example example type='incorrect'Nie tkniesz moich aktyw!/example Note that Nie tkniesz moich aktyw was added as a second example. Then it is concatenated to the 1st example, and there's no marker in the example either. And when parsing this XML: !-- Polish rule, 2014-04-04 -- rule id=ID name=dd pattern case_sensitive='yes' token inflected='yes' regexp='yes'\p{Lu}\p{Ll}+/token token-/token token inflected='yes'zdrójexception inflected='yes'Zdrój/exception/token /pattern messageW dwuczłonowych nazwach miast oba człony piszemy wielką literą: suggestionmatch no=1/-match no=3 case_conversion=startupper//suggestion/message urlhttps://pl.wikipedia.org/wiki/Pomoc:Powszechne_błędy_językowe/url shortBłąd pisowni wielką i małą literą/short example type='incorrect'Mieszkam markerRabce-zdroju/marker./example example type='correct'Mieszkam w Rabce-Zdroju./example /rule I get reports that the rule is problematic. It is not. Also, updating the examples is buggy: when I changed examples to create a second rule (a variant of the first one), the old example stayed in the XML. Regards, Marcin -- ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: XML element and attribute statistics
Well moving the rules to common file will defeat the purpose of the split, especially when more and more rules will use unification... I looked in the code a bit more and it looks that we disable validation when we load rules in LT, that's why loading in run-time is working fine, and then I guess Java references the unifications that are already loaded. I'll keep looking for good solution, Andriy 2014-04-03 12:23 GMT-04:00 Daniel Naber daniel.na...@languagetool.org: On 2014-04-03 01:12, Andriy Rysin wrote: I guess we have two ways to go from here: adjust the tests to load files and keep them (I am not sure how easy it is - depends on how flexible our XMLValidator is) We're just using standard XML validation, I don't think there's much we can do (other than catching that specific exception, which would be very ugly). But not many rules are affected, what about moving those to grammar.xml? (I know, that's not very elegant either). Regards Daniel -- ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: XML element and attribute statistics
On 4 April 2014 18:29, Andriy Rysin ary...@gmail.com wrote: Well moving the rules to common file will defeat the purpose of the split, especially when more and more rules will use unification... Similar scenario, I create a temp common file, validate with that then rm the temp file? Suites both uses? HTH DaveP I looked in the code a bit more and it looks that we disable validation when we load rules in LT, that's why loading in run-time is working fine, and then I guess Java references the unifications that are already loaded. I'll keep looking for good solution, Andriy 2014-04-03 12:23 GMT-04:00 Daniel Naber daniel.na...@languagetool.org: On 2014-04-03 01:12, Andriy Rysin wrote: I guess we have two ways to go from here: adjust the tests to load files and keep them (I am not sure how easy it is - depends on how flexible our XMLValidator is) We're just using standard XML validation, I don't think there's much we can do (other than catching that specific exception, which would be very ugly). But not many rules are affected, what about moving those to grammar.xml? (I know, that's not very elegant either). Regards Daniel -- ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk -- ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: prototype of new rule editor
On 2014-04-04 10:32, Marcin Miłkowski wrote: Another one: adding a second example (under Chrome) creates incorrect XML: Can you specify the exact workflow that leads to this? I couldn't reproduce so far. Regards Daniel -- ___ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel
Re: XML element and attribute statistics
On 04/03/2014 12:23 PM, Daniel Naber wrote: On 2014-04-03 01:12, Andriy Rysin wrote: I guess we have two ways to go from here: adjust the tests to load files and keep them (I am not sure how easy it is - depends on how flexible our XMLValidator is) We're just using standard XML validation, I don't think there's much we can do (other than catching that specific exception, which would be very ugly). But not many rules are affected, what about moving those to grammar.xml? (I know, that's not very elegant either). Here's the patch for the solution that I think should be acceptable: for multiple grammar files when validating we extract all unification elements from the first file and prepend them to the rest of the files. Advantages: * only tests are affected by this change * only langauges with multiple grammar xml files are affected * low overhead (re-including only the elements we need) I would appreciate any feedback, Thanks Andriy diff --git a/languagetool-core/src/test/java/org/languagetool/XMLValidator.java b/languagetool-core/src/test/java/org/languagetool/XMLValidator.java index e113dbb..ce9c6d4 100644 --- a/languagetool-core/src/test/java/org/languagetool/XMLValidator.java +++ b/languagetool-core/src/test/java/org/languagetool/XMLValidator.java @@ -27,15 +27,22 @@ import java.util.regex.Matcher; import java.util.regex.Pattern; import javax.xml.XMLConstants; +import javax.xml.parsers.DocumentBuilder; +import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; +import javax.xml.transform.Source; +import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamSource; import javax.xml.validation.Schema; import javax.xml.validation.SchemaFactory; import javax.xml.validation.Validator; import org.languagetool.tools.StringTools; +import org.w3c.dom.Document; +import org.w3c.dom.Node; +import org.w3c.dom.NodeList; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; @@ -123,6 +130,61 @@ public final class XMLValidator { /** * Validate XML file using the given XSD. Throws an exception on error. + * @param baseFilename File to prepend common parts (unification) from before validating main file + * @param filename File in classpath to validate + * @param xmlSchemaPath XML schema file in classpath + */ + public void validateWithXmlSchema(String baseFilename, String filename, String xmlSchemaPath) throws IOException { +try { + final InputStream xmlStream = this.getClass().getResourceAsStream(filename); + final InputStream baseXmlStream = this.getClass().getResourceAsStream(baseFilename); + if (xmlStream == null || baseXmlStream == null ) { +throw new IOException(File not found in classpath: + filename); + } + try { +final URL schemaUrl = this.getClass().getResource(xmlSchemaPath); +if (schemaUrl == null) { + throw new IOException(XML schema not found in classpath: + xmlSchemaPath); +} +validateInternal(mergeIntoSource(baseXmlStream, xmlStream, this.getClass().getResource(xmlSchemaPath)), schemaUrl); + } finally { +xmlStream.close(); + } +} catch (Exception e) { + throw new IOException(Cannot load or parse ' + filename + ', e); +} + } + + + private static Source mergeIntoSource(InputStream baseXmlStream, InputStream xmlStream, URL xmlSchema) throws Exception { +DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance(); +domFactory.setIgnoringComments(true); +domFactory.setValidating(false); +domFactory.setNamespaceAware(true); + +//SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI); +//Schema schema = sf.newSchema(xmlSchema); +//domFactory.setSchema(schema); + +DocumentBuilder builder = domFactory.newDocumentBuilder(); +Document baseDoc = builder.parse(baseXmlStream); +Document ruleDoc = builder.parse(xmlStream); + +// Shall this be more generic, i.e. reuse not just unification ??? +NodeList unificationNodes = baseDoc.getElementsByTagName(unification); +Node ruleNode = ruleDoc.getElementsByTagName(rules).item(0); +Node firstChildRuleNode = ruleNode.getChildNodes().item(1); + +for(int i=0; iunificationNodes.getLength(); i++) { + Node unificationNode = ruleDoc.importNode(unificationNodes.item(i), true); + ruleNode.insertBefore(unificationNode, firstChildRuleNode); +} + +return new DOMSource(ruleDoc); + } + + /** + * Validate XML file using the given XSD. Throws an exception on error. * @param xml the XML string to be validated * @param xmlSchemaPath XML schema file in classpath * @since 2.3 @@ -171,6 +233,14 @@ public final class XMLValidator { validator.validate(new StreamSource(xml)); } + private void