On 5 April 2014 17:11, Mike Unwalla <m...@techscribe.co.uk> wrote:
> Hi All,
>
>> But maybe the standard LT would benefit from your rules as well?
>
> I am happy to donate all or some of the rules that I developed for STE issue
> 3. The most recent version of the rules is on
> www.simplified-english.co.uk/installation.html.
>
> Most of the rules that I developed are specifically for STE and contain
> customized postags. Example:
>  <token postag_regexp="yes"
> postag="STE_VERB_LEXICAL_BASE|STE_TVb_BASE|STE_TVb_2_WORD_BASE|PROJECT_TVb_B
> ASE|PROJECT_TVb_2_WORD_BASE"></token>
>
> The STE rules must be 'fail safe'. To develop rules that give correct
> results with all words in the English lexicon is difficult.

If you can define a transform I'll write a stylesheet to do it
(perhaps leaving the extra tags as comments)

HTH


>
>> I don't want to make the rule set for the journal part of the standard
> distribution, as they quite specific. At the same time, I want to use
> standard rules. So I simply want to open the additional rule set before I
> make the check.
>
> This is similar to my situation. Also, when I check a text, I use more than
> one rule set. The STE rules that are on the simplified-english website are
> the 'core', as defined by the STEMG (www.asd-ste100.org). For each project,
> I have a grammar file and a disambiguation file
> (www.simplified-english.co.uk/design.html has a picture). When I check a
> text, I use both the core STE files and the project files.
>
> Some scenarios for the use of user files are as follows:
> * Single-user environment. User wants to use standalone LT and LT in
> OpenOffice. Currently, the user must copy/paste the files from the
> standalone directory to an OpenOffice directory. (Testrules is available
> only with standalone, thus, to develop user rules, that version of LT is
> always necessary.)
> * Multi-user environment. Grammar and disambiguation files are on a server.
> LT accesses these files only.
> * Multi-user environment. Grammar and disambiguation files are on a server.
> LT simultaneously accesses these files and project-specific grammar files
> that are on a user's computer.
>
> Possibly, one option is to split the disambiguation file into 2 parts. (And
> similarly with the grammar file.) The first part is only a 'wrapper', which
> refers to the default LT disambiguation file:
>
> <?xml version="1.0" encoding="utf-8"?>
> <!DOCTYPE doc [
> <!ENTITY DefaultLTDisambiguation SYSTEM
> "org/languagetool/resource/en/disambiguation-default.xml">
> ]>
> <rules lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
> xsi:noNamespaceSchemaLocation="http://svn.code.sf.net/p/languagetool/code/tr
> unk/languagetool/languagetool-core/src/main/resources/org/languagetool/resou
> rce/disambiguation.xsd">
>
> &DefaultLTDisambiguation; <!-- The content of the current
> disambiguation.xml, but without the rules element -->
>
> <!--An explanation of how to add external entities goes here. -->
> </rules>
>
> 'Out of the box', LT works as usual. However, a user can edit the 'wrapper'
> disambiguation file to make LT use other rule sets.
>
> Possible problem 1: Because the user can install LT anywhere, the path for
> DefaultLTDisambiguation must be relative to the installation directory. But,
> that can cause a validating XML editor to show an error and not open the
> file. If the user wants to use a validating XML editor, the solution is to
> edit the file with the full path.
>
> Possible problem 2: Dave Pawson suggested that xInclude is preferable to
> entities (http://sourceforge.net/p/languagetool/mailman/message/32177932/).
>
> Possible problem 3: Each time that the user updates LT, the user must edit
> the 'wrapper' disambiguation file or copy/paste from the previous LT
> version. (But, with the integrate attribute, presumably a user must specify
> the location of the user file(s), so the same problem exists with that.)
>
> Regards,
>
> Mike Unwalla
> Contact: www.techscribe.co.uk/techw/contact.htm
>
>
> -----Original Message-----
> From: Marcin Milkowski [mailto:list-addr...@wp.pl]
> Sent: 05 April 2014 08:03
> To: languagetool-devel@lists.sourceforge.net
> Subject: Re: External rule files
>
> W dniu 2014-04-04 19:24, Mike Unwalla pisze:
>> Hi All,
>>
>>>   I'm not sure why Mike Unwalla doesn't want to use our disambiguation
>> rules
>>
>> I do not have a fundamental objection to using the LT disambiguation file
>> with the STE rules. Part of the reason that I now do not use the LT
>> disambiguation rules is historical.
>>
>> The LT disambiguation rules are not sufficient for the STE term checker.
>> Examples:
>> * A part-of-speech disambiguator is necessary (primarily for noun/verb
>> disambiguation).
>> * Each term that is in the STE specification must be specified in the
>> disambiguation rules with its approved and not-approved parts of speech.
>>
>> When I started to write the STE disambiguation rules, I did not know how
> to
>> add rules to an external file
>> (http://wiki.languagetool.org/tips-and-tricks#toc2). Therefore, the
>> disambiguation file was in <installation
> path>\org\languagetool\resource\en.
>>
>> If I add the STE rules at the end of the LT disambiguation file, each time
>> that I update LT, I must copy/paste the STE rules into the new LT
>> disambiguation file. If some part of the new LT disambiguation has an
> effect
>> on the STE rules, I must change the STE rules. Most of the rules in the LT
>> disambiguation file are not applicable to the STE rules. Therefore, my
>> easiest option was to write a completely new disambiguation file.
>
> I can see. But maybe the standard LT would benefit from your rules as well?
>
>>>   Maybe there's place for a third value, when you want to use the
> existing
>> language with its tokenization, tagger and all, but you don't want to use
>> its rules (integrate="replace_only_rules").
>>
>> What is the difference between this third option and the second option,
>> where you replace the LT rules with customized rules?
>
> If you just replace the rules, you can use the tagger and the tokenizer.
> Without it, nothing like that is available, unless you copy your rules
> over the standard distribution.
>
>>
>>> I also added a new (now unused) attribute to <rules> element but the idea
>> is simple: If you have integrate="add", then rules will be added,
>>
>> Why is this attribute necessary? What are the problems with an external
> rule
>> file (http://wiki.languagetool.org/tips-and-tricks#toc2) that the new
>> attribute solves?
>
> I am editor of a scientific journal, and we have certain conventions
> that are not universal (mostly for the footnote references and
> consistent typography). I don't want to make the rule set for the
> journal part of the standard distribution, as they quite specific. At
> the same time, I want to use standard rules. So I simply want to open
> the additional rule set before I make the check.
>
> Regards,
> Marcin
>
> ----------------------------------------------------------------------------
> --
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Languagetool-devel mailing list
> Languagetool-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/languagetool-devel



-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk

------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to