Re: prototype of new rule editor

2014-04-04 Thread Marcin Miłkowski
W dniu 2014-04-03 21:51, Daniel Naber pisze:
 On 2014-04-03 21:27, Marcin Miłkowski wrote:

 The URL box is overly sensitive to the format of URLs, and does not
 accept for example this one:

 http://poradnia.pwn.pl/lista.php?id=9687

 Thanks, will be fixed with the next deployment (it was a Chrome-only
 issue).

Another one: adding a second example (under Chrome) creates incorrect XML:

example type='incorrect'Obiecano odmrożenie markerzablokowanych 
aktyw/marker reżimu. Nie tkniesz moich aktyw!/example
  example type='correct'Obiecano odmrożenie zablokowanych aktywów 
reżimu./example
  example type='incorrect'Nie tkniesz moich aktyw!/example

Note that Nie tkniesz moich aktyw was added as a second example. Then 
it is concatenated to the 1st example, and there's no marker in the 
example either.

Regards,
  Marcin

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


External rule files

2014-04-04 Thread Marcin Miłkowski
Hi all,

I added some preliminary, very sketchy support for adding (rather than 
replacing) rules to the existing rule base. This is a first step towards 
user-level rules.

This is a bit rough on the edges as I had no idea how to subclass a 
class returned by:

Language.getLanguageForShortName()

Does anyone have an idea how to do this? Google didn't help ;)

I also added a new (now unused) attribute to rules element but the 
idea is simple: If you have integrate=add, then rules will be added, 
rather than replace the existing ones (integrate=replace, the default 
value?). Maybe there's place for a third value, when you want to use the 
existing language with its tokenization, tagger and all, but you don't 
want to use its rules (integrate=replace_only_rules). I will add the 
code that supports the attributes as soon as we're clear which ones we need.

Any ideas? I think this is related to STE term checker, which could 
probably benefit from the third option -- I'm not sure why Mike Unwalla 
doesn't want to use our disambiguation rules, however. But if someone 
needs a special scenario in which disambiguation rules are added, we 
could have a special .zip format with two files: grammar and 
disambiguation. This is however a bit more complex.

Regards,
Marcin

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: prototype of new rule editor

2014-04-04 Thread Marcin Miłkowski
W dniu 2014-04-04 10:32, Marcin Miłkowski pisze:
 W dniu 2014-04-03 21:51, Daniel Naber pisze:
 On 2014-04-03 21:27, Marcin Miłkowski wrote:

 The URL box is overly sensitive to the format of URLs, and does not
 accept for example this one:

 http://poradnia.pwn.pl/lista.php?id=9687

 Thanks, will be fixed with the next deployment (it was a Chrome-only
 issue).

 Another one: adding a second example (under Chrome) creates incorrect XML:

 example type='incorrect'Obiecano odmrożenie markerzablokowanych
 aktyw/marker reżimu. Nie tkniesz moich aktyw!/example
example type='correct'Obiecano odmrożenie zablokowanych aktywów
 reżimu./example
example type='incorrect'Nie tkniesz moich aktyw!/example

 Note that Nie tkniesz moich aktyw was added as a second example. Then
 it is concatenated to the 1st example, and there's no marker in the
 example either.

And when parsing this XML:

!-- Polish rule, 2014-04-04 --
rule id=ID name=dd
  pattern case_sensitive='yes'
   token inflected='yes' regexp='yes'\p{Lu}\p{Ll}+/token
   token-/token
   token inflected='yes'zdrójexception 
inflected='yes'Zdrój/exception/token
  /pattern
  messageW dwuczłonowych nazwach miast oba człony piszemy wielką 
literą: suggestionmatch no=1/-match no=3 
case_conversion=startupper//suggestion/message
  urlhttps://pl.wikipedia.org/wiki/Pomoc:Powszechne_błędy_językowe/url
  shortBłąd pisowni wielką i małą literą/short
  example type='incorrect'Mieszkam 
markerRabce-zdroju/marker./example
  example type='correct'Mieszkam w Rabce-Zdroju./example
/rule

I get reports that the rule is problematic. It is not.

Also, updating the examples is buggy: when I changed examples to create 
a second rule (a variant of the first one), the old example stayed in 
the XML.

Regards,
Marcin

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: XML element and attribute statistics

2014-04-04 Thread Andriy Rysin
Well moving the rules to common file will defeat the purpose of the
split, especially when more and more rules will use unification...

I looked in the code a bit more and it looks that we disable
validation when we load rules in LT, that's why loading in run-time is
working fine, and then I guess Java references the unifications that
are already loaded.

I'll keep looking for good solution,
Andriy


2014-04-03 12:23 GMT-04:00 Daniel Naber daniel.na...@languagetool.org:
 On 2014-04-03 01:12, Andriy Rysin wrote:

 I guess we have two ways to go from here: adjust the tests to load
 files
 and keep them (I am not sure how easy it is - depends on how flexible
 our XMLValidator is)

 We're just using standard XML validation, I don't think there's much we
 can do (other than catching that specific exception, which would be very
 ugly). But not many rules are affected, what about moving those to
 grammar.xml? (I know, that's not very elegant either).

 Regards
   Daniel


 --
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: XML element and attribute statistics

2014-04-04 Thread Dave Pawson
On 4 April 2014 18:29, Andriy Rysin ary...@gmail.com wrote:
 Well moving the rules to common file will defeat the purpose of the
 split, especially when more and more rules will use unification...

Similar scenario, I create a temp common file, validate with that
then rm the temp file?
  Suites both uses?

HTH DaveP



 I looked in the code a bit more and it looks that we disable
 validation when we load rules in LT, that's why loading in run-time is
 working fine, and then I guess Java references the unifications that
 are already loaded.

 I'll keep looking for good solution,
 Andriy


 2014-04-03 12:23 GMT-04:00 Daniel Naber daniel.na...@languagetool.org:
 On 2014-04-03 01:12, Andriy Rysin wrote:

 I guess we have two ways to go from here: adjust the tests to load
 files
 and keep them (I am not sure how easy it is - depends on how flexible
 our XMLValidator is)

 We're just using standard XML validation, I don't think there's much we
 can do (other than catching that specific exception, which would be very
 ugly). But not many rules are affected, what about moving those to
 grammar.xml? (I know, that's not very elegant either).

 Regards
   Daniel


 --
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel

 --
 ___
 Languagetool-devel mailing list
 Languagetool-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/languagetool-devel



-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: prototype of new rule editor

2014-04-04 Thread Daniel Naber
On 2014-04-04 10:32, Marcin Miłkowski wrote:

 Another one: adding a second example (under Chrome) creates incorrect 
 XML:

Can you specify the exact workflow that leads to this? I couldn't 
reproduce so far.

Regards
  Daniel


--
___
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


Re: XML element and attribute statistics

2014-04-04 Thread Andriy Rysin
On 04/03/2014 12:23 PM, Daniel Naber wrote:
 On 2014-04-03 01:12, Andriy Rysin wrote:

 I guess we have two ways to go from here: adjust the tests to load 
 files
 and keep them (I am not sure how easy it is - depends on how flexible
 our XMLValidator is)
 We're just using standard XML validation, I don't think there's much we 
 can do (other than catching that specific exception, which would be very 
 ugly). But not many rules are affected, what about moving those to 
 grammar.xml? (I know, that's not very elegant either).

Here's the patch for the solution that I think should be acceptable: for
multiple grammar files when validating we extract all unification
elements from the first file and prepend them to the rest of the files.
Advantages:
* only tests are affected by this change
* only langauges with multiple grammar xml files are affected
* low overhead (re-including only the elements we need)

I would appreciate any feedback,
Thanks
Andriy
diff --git a/languagetool-core/src/test/java/org/languagetool/XMLValidator.java b/languagetool-core/src/test/java/org/languagetool/XMLValidator.java
index e113dbb..ce9c6d4 100644
--- a/languagetool-core/src/test/java/org/languagetool/XMLValidator.java
+++ b/languagetool-core/src/test/java/org/languagetool/XMLValidator.java
@@ -27,15 +27,22 @@ import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
 import javax.xml.XMLConstants;
+import javax.xml.parsers.DocumentBuilder;
+import javax.xml.parsers.DocumentBuilderFactory;
 import javax.xml.parsers.ParserConfigurationException;
 import javax.xml.parsers.SAXParser;
 import javax.xml.parsers.SAXParserFactory;
+import javax.xml.transform.Source;
+import javax.xml.transform.dom.DOMSource;
 import javax.xml.transform.stream.StreamSource;
 import javax.xml.validation.Schema;
 import javax.xml.validation.SchemaFactory;
 import javax.xml.validation.Validator;
 
 import org.languagetool.tools.StringTools;
+import org.w3c.dom.Document;
+import org.w3c.dom.Node;
+import org.w3c.dom.NodeList;
 import org.xml.sax.InputSource;
 import org.xml.sax.SAXException;
 import org.xml.sax.SAXParseException;
@@ -123,6 +130,61 @@ public final class XMLValidator {
 
   /**
* Validate XML file using the given XSD. Throws an exception on error.
+   * @param baseFilename File to prepend common parts (unification) from before validating main file
+   * @param filename File in classpath to validate
+   * @param xmlSchemaPath XML schema file in classpath
+   */
+  public void validateWithXmlSchema(String baseFilename, String filename, String xmlSchemaPath) throws IOException {
+try {
+  final InputStream xmlStream = this.getClass().getResourceAsStream(filename);
+  final InputStream baseXmlStream = this.getClass().getResourceAsStream(baseFilename);
+  if (xmlStream == null || baseXmlStream == null ) {
+throw new IOException(File not found in classpath:  + filename);
+  }
+  try {
+final URL schemaUrl = this.getClass().getResource(xmlSchemaPath);
+if (schemaUrl == null) {
+  throw new IOException(XML schema not found in classpath:  + xmlSchemaPath);
+}
+validateInternal(mergeIntoSource(baseXmlStream, xmlStream, this.getClass().getResource(xmlSchemaPath)), schemaUrl);
+  } finally {
+xmlStream.close();
+  }
+} catch (Exception e) {
+  throw new IOException(Cannot load or parse ' + filename + ', e);
+}
+  }
+
+
+  private static Source mergeIntoSource(InputStream baseXmlStream, InputStream xmlStream, URL xmlSchema) throws Exception {
+DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
+domFactory.setIgnoringComments(true);
+domFactory.setValidating(false);
+domFactory.setNamespaceAware(true);
+
+//SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
+//Schema schema = sf.newSchema(xmlSchema);
+//domFactory.setSchema(schema);
+
+DocumentBuilder builder = domFactory.newDocumentBuilder();
+Document baseDoc = builder.parse(baseXmlStream);
+Document ruleDoc = builder.parse(xmlStream);
+
+// Shall this be more generic, i.e. reuse not just unification ???
+NodeList unificationNodes = baseDoc.getElementsByTagName(unification);
+Node ruleNode = ruleDoc.getElementsByTagName(rules).item(0);
+Node firstChildRuleNode = ruleNode.getChildNodes().item(1);
+
+for(int i=0; iunificationNodes.getLength(); i++) {
+  Node unificationNode = ruleDoc.importNode(unificationNodes.item(i), true);
+  ruleNode.insertBefore(unificationNode, firstChildRuleNode);
+}
+
+return new DOMSource(ruleDoc);
+  }
+  
+  /**
+   * Validate XML file using the given XSD. Throws an exception on error.
* @param xml the XML string to be validated
* @param xmlSchemaPath XML schema file in classpath
* @since 2.3
@@ -171,6 +233,14 @@ public final class XMLValidator {
 validator.validate(new StreamSource(xml));
   }
 
+  private void