On 5 April 2014 17:11, Mike Unwalla <m...@techscribe.co.uk> wrote: > Hi All, > >> But maybe the standard LT would benefit from your rules as well? > > I am happy to donate all or some of the rules that I developed for STE issue > 3. The most recent version of the rules is on > www.simplified-english.co.uk/installation.html. > > Most of the rules that I developed are specifically for STE and contain > customized postags. Example: > <token postag_regexp="yes" > postag="STE_VERB_LEXICAL_BASE|STE_TVb_BASE|STE_TVb_2_WORD_BASE|PROJECT_TVb_B > ASE|PROJECT_TVb_2_WORD_BASE"></token> > > The STE rules must be 'fail safe'. To develop rules that give correct > results with all words in the English lexicon is difficult.
If you can define a transform I'll write a stylesheet to do it (perhaps leaving the extra tags as comments) HTH > >> I don't want to make the rule set for the journal part of the standard > distribution, as they quite specific. At the same time, I want to use > standard rules. So I simply want to open the additional rule set before I > make the check. > > This is similar to my situation. Also, when I check a text, I use more than > one rule set. The STE rules that are on the simplified-english website are > the 'core', as defined by the STEMG (www.asd-ste100.org). For each project, > I have a grammar file and a disambiguation file > (www.simplified-english.co.uk/design.html has a picture). When I check a > text, I use both the core STE files and the project files. > > Some scenarios for the use of user files are as follows: > * Single-user environment. User wants to use standalone LT and LT in > OpenOffice. Currently, the user must copy/paste the files from the > standalone directory to an OpenOffice directory. (Testrules is available > only with standalone, thus, to develop user rules, that version of LT is > always necessary.) > * Multi-user environment. Grammar and disambiguation files are on a server. > LT accesses these files only. > * Multi-user environment. Grammar and disambiguation files are on a server. > LT simultaneously accesses these files and project-specific grammar files > that are on a user's computer. > > Possibly, one option is to split the disambiguation file into 2 parts. (And > similarly with the grammar file.) The first part is only a 'wrapper', which > refers to the default LT disambiguation file: > > <?xml version="1.0" encoding="utf-8"?> > <!DOCTYPE doc [ > <!ENTITY DefaultLTDisambiguation SYSTEM > "org/languagetool/resource/en/disambiguation-default.xml"> > ]> > <rules lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:noNamespaceSchemaLocation="http://svn.code.sf.net/p/languagetool/code/tr > unk/languagetool/languagetool-core/src/main/resources/org/languagetool/resou > rce/disambiguation.xsd"> > > &DefaultLTDisambiguation; <!-- The content of the current > disambiguation.xml, but without the rules element --> > > <!--An explanation of how to add external entities goes here. --> > </rules> > > 'Out of the box', LT works as usual. However, a user can edit the 'wrapper' > disambiguation file to make LT use other rule sets. > > Possible problem 1: Because the user can install LT anywhere, the path for > DefaultLTDisambiguation must be relative to the installation directory. But, > that can cause a validating XML editor to show an error and not open the > file. If the user wants to use a validating XML editor, the solution is to > edit the file with the full path. > > Possible problem 2: Dave Pawson suggested that xInclude is preferable to > entities (http://sourceforge.net/p/languagetool/mailman/message/32177932/). > > Possible problem 3: Each time that the user updates LT, the user must edit > the 'wrapper' disambiguation file or copy/paste from the previous LT > version. (But, with the integrate attribute, presumably a user must specify > the location of the user file(s), so the same problem exists with that.) > > Regards, > > Mike Unwalla > Contact: www.techscribe.co.uk/techw/contact.htm > > > -----Original Message----- > From: Marcin Milkowski [mailto:list-addr...@wp.pl] > Sent: 05 April 2014 08:03 > To: languagetool-devel@lists.sourceforge.net > Subject: Re: External rule files > > W dniu 2014-04-04 19:24, Mike Unwalla pisze: >> Hi All, >> >>> I'm not sure why Mike Unwalla doesn't want to use our disambiguation >> rules >> >> I do not have a fundamental objection to using the LT disambiguation file >> with the STE rules. Part of the reason that I now do not use the LT >> disambiguation rules is historical. >> >> The LT disambiguation rules are not sufficient for the STE term checker. >> Examples: >> * A part-of-speech disambiguator is necessary (primarily for noun/verb >> disambiguation). >> * Each term that is in the STE specification must be specified in the >> disambiguation rules with its approved and not-approved parts of speech. >> >> When I started to write the STE disambiguation rules, I did not know how > to >> add rules to an external file >> (http://wiki.languagetool.org/tips-and-tricks#toc2). Therefore, the >> disambiguation file was in <installation > path>\org\languagetool\resource\en. >> >> If I add the STE rules at the end of the LT disambiguation file, each time >> that I update LT, I must copy/paste the STE rules into the new LT >> disambiguation file. If some part of the new LT disambiguation has an > effect >> on the STE rules, I must change the STE rules. Most of the rules in the LT >> disambiguation file are not applicable to the STE rules. Therefore, my >> easiest option was to write a completely new disambiguation file. > > I can see. But maybe the standard LT would benefit from your rules as well? > >>> Maybe there's place for a third value, when you want to use the > existing >> language with its tokenization, tagger and all, but you don't want to use >> its rules (integrate="replace_only_rules"). >> >> What is the difference between this third option and the second option, >> where you replace the LT rules with customized rules? > > If you just replace the rules, you can use the tagger and the tokenizer. > Without it, nothing like that is available, unless you copy your rules > over the standard distribution. > >> >>> I also added a new (now unused) attribute to <rules> element but the idea >> is simple: If you have integrate="add", then rules will be added, >> >> Why is this attribute necessary? What are the problems with an external > rule >> file (http://wiki.languagetool.org/tips-and-tricks#toc2) that the new >> attribute solves? > > I am editor of a scientific journal, and we have certain conventions > that are not universal (mostly for the footnote references and > consistent typography). I don't want to make the rule set for the > journal part of the standard distribution, as they quite specific. At > the same time, I want to use standard rules. So I simply want to open > the additional rule set before I make the check. > > Regards, > Marcin > > ---------------------------------------------------------------------------- > -- > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > > ------------------------------------------------------------------------------ > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk ------------------------------------------------------------------------------ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel