My 2 cents: I would approve only step 2, because step 1 makes rules harder to read, which makes it harder for newcomers to decypher what some rules mean and to learn to write new rules.
Lp, m. 2012/12/30 Marcin Miłkowski <list-addr...@wp.pl>: > Hi, > > What is the problem you are trying to solve? There is lot of redundancy > in xml but your third step makes rules much harder to check (two > different ways to make mistakes, no XML-based syntax checks possible > anymore). If your editor does not support large XML files, then changing > the editor seems the best solution. I'm fine with step one and step two, > though. But step 3 is not a good idea, as you're trying to reinvent the > wheel - adding exceptions would be a nightmare in the new syntax scheme. > Plus, there will be new syntax to learn for us... > > We can zip the files if the download size is the problem you are trying > to solve. > > Best, > Marcin > > W dniu 2012-12-30 21:56, Daniel Naber pisze: >> Hi, >> >> we have three languages with grammar files that are more than 1 MB large >> (German, French, Catalan). The German grammar.xml has more than 24,000 >> lines. This size makes editing the files difficult. I have some ideas on how >> to improve the situation and I'm looking for other ideas and comments: >> >> Step 1 - the easy one >> >> We can make the syntax a bit more compact and readable by changing some >> elements: >> >> <marker> => <m> >> <suggestion> => <s> >> <example type="correct"> => <right> >> <example type="incorrect"> => <wrong> >> >> >> Step 2 - less repetition (also easy to implement) >> >> The contents of <message>, <url>, and <short> should be inherited from a >> <rulegroup> element to its <rule> elements. This way those elements do not >> need to be repeated if the are the same for all rules of a rulegroup. >> >> >> Step 3 - an XML-free pattern >> >> Add a compact way to describe simple patterns. This is best explained by >> example. What is now this: >> >> <pattern> >> <token regexp="yes">foo|bar</token> >> <marker> >> <token>myerror</token> >> </marker> >> </pattern> >> >> ...could be written like this: >> >> <p>re:foo|bar _myerror_</p> >> >> Thus you don't need "<token>" at all as a whitespace implies a token >> boundary. The prefix "re:" turns on regular expression matching (the same >> for "pos:" -> POS tag, "pos:re:" -> POS tag regex). "<marker>" is replaced >> by underscores. This does not support exceptions and other advanced >> features, but it turns a 6-line rule into a 1-line rule. This new syntax is >> optional, i.e. the old one can still be used. >> >> What do you think about that? Other suggestions for making rule syntax more >> compact? >> >> Regards >> Daniel >> > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. ON SALE this month only -- learn more at: > http://p.sf.net/sfu/learnmore_123012 > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. ON SALE this month only -- learn more at: http://p.sf.net/sfu/learnmore_123012 _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel