Hello, Readability is more important than decreasing the size of a file. In my opinion, Step 1 and Step 3 decrease readability. '<marker'> is clearer than '<m>'.
In a related reply, Dominique wrote: It will only marginally reduce size. But shorter add less noise so it's clearer in my opinion. <m> and <s> may look less readable than <marker> and <suggestion> but since rule developers use them all the time, they would be well familiar with them. I do not create rules each day. Typically, I work with LT each day for 2 or 3 weeks. Then, I work on other projects for weeks or months. Regards, Mike Unwalla Contact: www.techscribe.co.uk/techw/contact.htm -----Original Message----- From: Daniel Naber [mailto:list2...@danielnaber.de] Sent: 30 December 2012 20:56 To: development discussion for LanguageTool Subject: making XML rules more compact? Hi, we have three languages with grammar files that are more than 1 MB large (German, French, Catalan). The German grammar.xml has more than 24,000 lines. This size makes editing the files difficult. I have some ideas on how to improve the situation and I'm looking for other ideas and comments: Step 1 - the easy one We can make the syntax a bit more compact and readable by changing some elements: <marker> => <m> <suggestion> => <s> <example type="correct"> => <right> <example type="incorrect"> => <wrong> Step 2 - less repetition (also easy to implement) The contents of <message>, <url>, and <short> should be inherited from a <rulegroup> element to its <rule> elements. This way those elements do not need to be repeated if the are the same for all rules of a rulegroup. Step 3 - an XML-free pattern Add a compact way to describe simple patterns. This is best explained by example. What is now this: <pattern> <token regexp="yes">foo|bar</token> <marker> <token>myerror</token> </marker> </pattern> ...could be written like this: <p>re:foo|bar _myerror_</p> Thus you don't need "<token>" at all as a whitespace implies a token boundary. The prefix "re:" turns on regular expression matching (the same for "pos:" -> POS tag, "pos:re:" -> POS tag regex). "<marker>" is replaced by underscores. This does not support exceptions and other advanced features, but it turns a 6-line rule into a 1-line rule. This new syntax is optional, i.e. the old one can still be used. What do you think about that? Other suggestions for making rule syntax more compact? Regards Daniel ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412 _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel