Hi, W dniu 2012-12-31 14:01, Mauro Condarelli pisze: > Hi All, > > On 31/12/2012 12:48, Mike Unwalla wrote: >> Hello, >> >> Readability is more important than decreasing the size of a file. In my >> opinion, Step 1 and Step 3 decrease readability. '<marker'> is clearer than >> '<m>'. > I completely agree with the above. > > The point I was trying to make is xml doesn't look suited to describe a > set of production rules for text transformation (disambiguator) or > syntax check (grammar). > > In such a case it's common to devise a DSL (Domain Specific Language) > precisely describing the problem and thus enhancing manifold readability > and maintainability. > The downside of this approach is the need to build a complete toolchain > for the new language, including a suitable editor and a compiler. > > I was pointing out eclipse includes all tools to easily do all necessary > framework with very little effort (actually little more than writing the > BNF grammar for the DSL itself).
Well, I'm not sure if this will be so easy, as conversion of XML languages into BNF is not a completely trivial business. There are no standard converters between XML Schema and BNF, for example, and I'm not sure if XSD is context-free just like BNF. It might be higher in Chomsky's hierarchy because it allows for some context-sensitivity in element names and regular expressions on the right-hand side of productions... I'm not sure how much of this is actually used in our .xsd files. > This can be deployed into eclipse > itself (as a plugin) or wrapped in a stand-alone "RCP" application > acting as a (very fat) editor (complete with syntax-highlighting, > on-the-fly error detection and auto-completion) for the language files We already have XML editors that do that, and more. > that, as a "side effect" produces also some suitable representation of > the semantic. This "suitable representation" could be in the form of > compilable java classes (for speed) or even the current xml syntax (for > compatibility). Well, it would be nice to compile our rules for speed, but for the user, I still think that a database-like front-end would be much better. The DSL seems to replace a hard language to learn with another hard language to learn. Regards, Marcin > > Regards > Mauro >> In a related reply, Dominique wrote: >> It will only marginally reduce size. But shorter add less noise >> so it's clearer in my opinion. <m> and <s> may look less readable >> than <marker> and <suggestion> but since rule developers >> use them all the time, they would be well familiar with them. >> >> I do not create rules each day. Typically, I work with LT each day for 2 or >> 3 weeks. Then, I work on other projects for weeks or months. >> >> Regards, >> >> Mike Unwalla >> Contact: www.techscribe.co.uk/techw/contact.htm >> >> >> -----Original Message----- >> From: Daniel Naber [mailto:list2...@danielnaber.de] >> Sent: 30 December 2012 20:56 >> To: development discussion for LanguageTool >> Subject: making XML rules more compact? >> >> Hi, >> >> we have three languages with grammar files that are more than 1 MB large >> (German, French, Catalan). The German grammar.xml has more than 24,000 >> lines. This size makes editing the files difficult. I have some ideas on how >> >> to improve the situation and I'm looking for other ideas and comments: >> >> Step 1 - the easy one >> >> We can make the syntax a bit more compact and readable by changing some >> elements: >> >> <marker> => <m> >> <suggestion> => <s> >> <example type="correct"> => <right> >> <example type="incorrect"> => <wrong> >> >> >> Step 2 - less repetition (also easy to implement) >> >> The contents of <message>, <url>, and <short> should be inherited from a >> <rulegroup> element to its <rule> elements. This way those elements do not >> need to be repeated if the are the same for all rules of a rulegroup. >> >> >> Step 3 - an XML-free pattern >> >> Add a compact way to describe simple patterns. This is best explained by >> example. What is now this: >> >> <pattern> >> <token regexp="yes">foo|bar</token> >> <marker> >> <token>myerror</token> >> </marker> >> </pattern> >> >> ...could be written like this: >> >> <p>re:foo|bar _myerror_</p> >> >> Thus you don't need "<token>" at all as a whitespace implies a token >> boundary. The prefix "re:" turns on regular expression matching (the same >> for "pos:" -> POS tag, "pos:re:" -> POS tag regex). "<marker>" is replaced >> by underscores. This does not support exceptions and other advanced >> features, but it turns a 6-line rule into a 1-line rule. This new syntax is >> optional, i.e. the old one can still be used. >> >> What do you think about that? Other suggestions for making rule syntax more >> compact? >> >> Regards >> Daniel >> >> >> ------------------------------------------------------------------------------ >> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, >> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current >> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft >> MVPs and experts. SALE $99.99 this month only -- learn more at: >> http://p.sf.net/sfu/learnmore_122412 >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > > ------------------------------------------------------------------------------ > Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, > MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current > with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft > MVPs and experts. SALE $99.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122412 > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > ------------------------------------------------------------------------------ Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122412 _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel