Hi All,

> But maybe the standard LT would benefit from your rules as well?

I am happy to donate all or some of the rules that I developed for STE issue
3. The most recent version of the rules is on
www.simplified-english.co.uk/installation.html. 

Most of the rules that I developed are specifically for STE and contain
customized postags. Example:
 <token postag_regexp="yes"
postag="STE_VERB_LEXICAL_BASE|STE_TVb_BASE|STE_TVb_2_WORD_BASE|PROJECT_TVb_B
ASE|PROJECT_TVb_2_WORD_BASE"></token>

The STE rules must be 'fail safe'. To develop rules that give correct
results with all words in the English lexicon is difficult. 

> I don't want to make the rule set for the journal part of the standard
distribution, as they quite specific. At the same time, I want to use
standard rules. So I simply want to open the additional rule set before I
make the check.

This is similar to my situation. Also, when I check a text, I use more than
one rule set. The STE rules that are on the simplified-english website are
the 'core', as defined by the STEMG (www.asd-ste100.org). For each project,
I have a grammar file and a disambiguation file
(www.simplified-english.co.uk/design.html has a picture). When I check a
text, I use both the core STE files and the project files.

Some scenarios for the use of user files are as follows:
* Single-user environment. User wants to use standalone LT and LT in
OpenOffice. Currently, the user must copy/paste the files from the
standalone directory to an OpenOffice directory. (Testrules is available
only with standalone, thus, to develop user rules, that version of LT is
always necessary.)
* Multi-user environment. Grammar and disambiguation files are on a server.
LT accesses these files only.
* Multi-user environment. Grammar and disambiguation files are on a server.
LT simultaneously accesses these files and project-specific grammar files
that are on a user's computer.

Possibly, one option is to split the disambiguation file into 2 parts. (And
similarly with the grammar file.) The first part is only a 'wrapper', which
refers to the default LT disambiguation file:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE doc [
<!ENTITY DefaultLTDisambiguation SYSTEM
"org/languagetool/resource/en/disambiguation-default.xml">
]>
<rules lang="en" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:noNamespaceSchemaLocation="http://svn.code.sf.net/p/languagetool/code/tr
unk/languagetool/languagetool-core/src/main/resources/org/languagetool/resou
rce/disambiguation.xsd">

&DefaultLTDisambiguation; <!-- The content of the current
disambiguation.xml, but without the rules element -->

<!--An explanation of how to add external entities goes here. -->
</rules>

'Out of the box', LT works as usual. However, a user can edit the 'wrapper'
disambiguation file to make LT use other rule sets. 

Possible problem 1: Because the user can install LT anywhere, the path for
DefaultLTDisambiguation must be relative to the installation directory. But,
that can cause a validating XML editor to show an error and not open the
file. If the user wants to use a validating XML editor, the solution is to
edit the file with the full path.

Possible problem 2: Dave Pawson suggested that xInclude is preferable to
entities (http://sourceforge.net/p/languagetool/mailman/message/32177932/).

Possible problem 3: Each time that the user updates LT, the user must edit
the 'wrapper' disambiguation file or copy/paste from the previous LT
version. (But, with the integrate attribute, presumably a user must specify
the location of the user file(s), so the same problem exists with that.)

Regards,

Mike Unwalla
Contact: www.techscribe.co.uk/techw/contact.htm 


-----Original Message-----
From: Marcin Milkowski [mailto:list-addr...@wp.pl] 
Sent: 05 April 2014 08:03
To: languagetool-devel@lists.sourceforge.net
Subject: Re: External rule files

W dniu 2014-04-04 19:24, Mike Unwalla pisze:
> Hi All,
>
>>   I'm not sure why Mike Unwalla doesn't want to use our disambiguation
> rules
>
> I do not have a fundamental objection to using the LT disambiguation file
> with the STE rules. Part of the reason that I now do not use the LT
> disambiguation rules is historical.
>
> The LT disambiguation rules are not sufficient for the STE term checker.
> Examples:
> * A part-of-speech disambiguator is necessary (primarily for noun/verb
> disambiguation).
> * Each term that is in the STE specification must be specified in the
> disambiguation rules with its approved and not-approved parts of speech.
>
> When I started to write the STE disambiguation rules, I did not know how
to
> add rules to an external file
> (http://wiki.languagetool.org/tips-and-tricks#toc2). Therefore, the
> disambiguation file was in <installation
path>\org\languagetool\resource\en.
>
> If I add the STE rules at the end of the LT disambiguation file, each time
> that I update LT, I must copy/paste the STE rules into the new LT
> disambiguation file. If some part of the new LT disambiguation has an
effect
> on the STE rules, I must change the STE rules. Most of the rules in the LT
> disambiguation file are not applicable to the STE rules. Therefore, my
> easiest option was to write a completely new disambiguation file.

I can see. But maybe the standard LT would benefit from your rules as well?

>>   Maybe there's place for a third value, when you want to use the
existing
> language with its tokenization, tagger and all, but you don't want to use
> its rules (integrate="replace_only_rules").
>
> What is the difference between this third option and the second option,
> where you replace the LT rules with customized rules?

If you just replace the rules, you can use the tagger and the tokenizer. 
Without it, nothing like that is available, unless you copy your rules 
over the standard distribution.

>
>> I also added a new (now unused) attribute to <rules> element but the idea
> is simple: If you have integrate="add", then rules will be added,
>
> Why is this attribute necessary? What are the problems with an external
rule
> file (http://wiki.languagetool.org/tips-and-tricks#toc2) that the new
> attribute solves?

I am editor of a scientific journal, and we have certain conventions 
that are not universal (mostly for the footnote references and 
consistent typography). I don't want to make the rule set for the 
journal part of the standard distribution, as they quite specific. At 
the same time, I want to use standard rules. So I simply want to open 
the additional rule set before I make the check.

Regards,
Marcin

----------------------------------------------------------------------------
--
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel


------------------------------------------------------------------------------
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to