Thanks Dave.

 I am not an XML expert. I understand the phrase 'define a transform' to
mean 'specify a mapping'. If my understanding is not correct, please tell
me.

There is not a 1:1 mapping between the term checker postags and the LT
postags. Thus, I cannot define a transform for all the postags, but I can
define a transform for some of them. However, there are possible problems as
the examples below show.

Example 1. Ignoring technical verbs that LT does not 'know', a verb that has
the postag STE_VERB_LEXICAL_BASE usually has the LT postag VB. However,
although the verb 'do' has the LT postag VB, it does not have the postag
STE_VERB_LEXICAL_BASE. (It has the postags STE_VERB_AUXILIARY_DO and
STE_VERB_AUXILIARY_CAN_DO_MUST_WILL.) Thus, without excluding 'do' from a
rule, you cannot map STE_VERB_LEXICAL_BASE to VB.

Example 2. With an approved 2-word plural noun, the first word has the
postag STE_TN_NOUN_MULTI_WORD_PLURAL_1 and the second word has the postag
STE_TN_NOUN_MULTI_WORD_PLURAL_2. (TN is an abbreviation of 'Technical Name',
which is a term from the STE specification.) The 3 terms that follow are
approved 2-word nouns. The LT postags that relate to nouns are different for
the first word. The LT postags for nouns are in brackets:
circuit breakers (NN, NNS)
duty cycles (NN:UN, NNS)
operating systems (-, NNS)

In a related e-mail, Marcin wrote: Hm, that means I will have to look at
them and manually create a generic version, if that only is possible. That
is already a big help for me, as it's not trivial to find regularities that
create good disambiguation rules.

Marcin, if a partial mapping helps you, let me know, and I will define one.

Regards,

Mike Unwalla
Contact: www.techscribe.co.uk/techw/contact.htm 

-----Original Message-----
From: Dave Pawson [mailto:dave.paw...@gmail.com] 
Sent: 05 April 2014 19:50
To: development discussion for LanguageTool
Subject: Re: External rule files

On 5 April 2014 17:11, Mike Unwalla <m...@techscribe.co.uk> wrote:
<snip>
> Most of the rules that I developed are specifically for STE and contain
> customized postags. Example:
>  <token postag_regexp="yes"
>
postag="STE_VERB_LEXICAL_BASE|STE_TVb_BASE|STE_TVb_2_WORD_BASE|PROJECT_TVb_B
> ASE|PROJECT_TVb_2_WORD_BASE"></token>
>
> The STE rules must be 'fail safe'. To develop rules that give correct
> results with all words in the English lexicon is difficult.

If you can define a transform I'll write a stylesheet to do it
(perhaps leaving the extra tags as comments)

HTH

<snip>


------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees_APR
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to