My 2 cents:

I would approve only step 2, because step 1 makes rules harder to
read, which makes it harder for newcomers to decypher what some rules
mean and to learn to write new rules.

Lp, m.

2012/12/30 Marcin Miłkowski <list-addr...@wp.pl>:
> Hi,
>
> What is the problem you are trying to solve? There is lot of redundancy
> in xml but your third step makes rules much harder to check (two
> different ways to make mistakes, no XML-based syntax checks possible
> anymore). If your editor does not support large XML files, then changing
> the editor seems the best solution. I'm fine with step one and step two,
> though. But step 3 is not a good idea, as you're trying to reinvent the
> wheel - adding exceptions would be a nightmare in the new syntax scheme.
> Plus, there will be new syntax to learn for us...
>
> We can zip the files if the download size is the problem you are trying
> to solve.
>
> Best,
> Marcin
>
> W dniu 2012-12-30 21:56, Daniel Naber pisze:
>> Hi,
>>
>> we have three languages with grammar files that are more than 1 MB large
>> (German, French, Catalan). The German grammar.xml has more than 24,000
>> lines. This size makes editing the files difficult. I have some ideas on how
>> to improve the situation and I'm looking for other ideas and comments:
>>
>> Step 1 - the easy one
>>
>> We can make the syntax a bit more compact and readable by changing some
>> elements:
>>
>> <marker> => <m>
>> <suggestion> => <s>
>> <example type="correct"> => <right>
>> <example type="incorrect"> => <wrong>
>>
>>
>> Step 2 - less repetition (also easy to implement)
>>
>> The contents of <message>, <url>, and <short> should be inherited from a
>> <rulegroup> element to its <rule> elements. This way those elements do not
>> need to be repeated if the are the same for all rules of a rulegroup.
>>
>>
>> Step 3 - an XML-free pattern
>>
>> Add a compact way to describe simple patterns. This is best explained by
>> example. What is now this:
>>
>> <pattern>
>>    <token regexp="yes">foo|bar</token>
>>    <marker>
>>      <token>myerror</token>
>>    </marker>
>> </pattern>
>>
>> ...could be written like this:
>>
>> <p>re:foo|bar _myerror_</p>
>>
>> Thus you don't need "<token>" at all as a whitespace implies a token
>> boundary. The prefix "re:" turns on regular expression matching (the same
>> for "pos:" -> POS tag, "pos:re:" -> POS tag regex). "<marker>" is replaced
>> by underscores. This does not support exceptions and other advanced
>> features, but it turns a 6-line rule into a 1-line rule. This new syntax is
>> optional, i.e. the old one can still be used.
>>
>> What do you think about that? Other suggestions for making rule syntax more
>> compact?
>>
>> Regards
>>   Daniel
>>
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_123012
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel

------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_123012
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to