Hello,

Readability is more important than decreasing the size of a file. In my
opinion, Step 1 and Step 3 decrease readability. '<marker'> is clearer than
'<m>'.

In a related reply, Dominique wrote:
        It will only marginally reduce size. But shorter add less noise
        so it's clearer in my opinion. <m> and <s> may look less readable
        than <marker> and <suggestion> but since rule developers
        use them all the time, they would be well familiar with them.

I do not create rules each day. Typically, I work with LT each day for 2 or
3 weeks. Then, I work on other projects for weeks or months. 

Regards,

Mike Unwalla
Contact: www.techscribe.co.uk/techw/contact.htm 


-----Original Message-----
From: Daniel Naber [mailto:list2...@danielnaber.de] 
Sent: 30 December 2012 20:56
To: development discussion for LanguageTool
Subject: making XML rules more compact?

Hi,

we have three languages with grammar files that are more than 1 MB large 
(German, French, Catalan). The German grammar.xml has more than 24,000 
lines. This size makes editing the files difficult. I have some ideas on how

to improve the situation and I'm looking for other ideas and comments:

Step 1 - the easy one

We can make the syntax a bit more compact and readable by changing some 
elements:

<marker> => <m>
<suggestion> => <s>
<example type="correct"> => <right>
<example type="incorrect"> => <wrong>


Step 2 - less repetition (also easy to implement)

The contents of <message>, <url>, and <short> should be inherited from a 
<rulegroup> element to its <rule> elements. This way those elements do not 
need to be repeated if the are the same for all rules of a rulegroup.


Step 3 - an XML-free pattern

Add a compact way to describe simple patterns. This is best explained by 
example. What is now this:

<pattern>
  <token regexp="yes">foo|bar</token>
  <marker>
    <token>myerror</token>
  </marker>
</pattern>

...could be written like this:

<p>re:foo|bar _myerror_</p>

Thus you don't need "<token>" at all as a whitespace implies a token 
boundary. The prefix "re:" turns on regular expression matching (the same 
for "pos:" -> POS tag, "pos:re:" -> POS tag regex). "<marker>" is replaced 
by underscores. This does not support exceptions and other advanced 
features, but it turns a 6-line rule into a 1-line rule. This new syntax is 
optional, i.e. the old one can still be used.

What do you think about that? Other suggestions for making rule syntax more 
compact?

Regards
 Daniel


------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to