W dniu 2014-04-07 15:58, Dave Pawson pisze:
> On 7 April 2014 14:43, Mike Unwalla <m...@techscribe.co.uk> wrote:
>>> and how you want these in the output, we can start from there.
>>
>> I think that we have a miscommunication. I don't need a mapping from the STE
>> postags to the LT postags. I created the STE postags for the term checker
>> because I can't do what I want to do with only the LT postags.
>
> Yes I think we do have an difference of understanding.
>
>>
>>> I need the XML source markup (is the source XML?)
>>
>> The source is XML. It is available from
>> www.simplified-english.co.uk/installation.html in the file
>> term-checker-evaluation-yyyy-mm-dd.zip (I do not give the current file name
>> in this e-mail because the .zip file name contains a date, and I put only
>> the most recent version of the file on the website.)
>>
>> But, if 'source markup' means a marked up document in which terms are
>> annotated with a postag, then no, I do not have source markup.
>
> No, I was thinking of the valid syntax of your form to that which is required?
> Either a schema or DTD.
>    Examples of marked up text would suffice, just take longer?
>
>
>>
>>> I'm not sure I understand this... If you can express the conditions, then
>> I can
>>> write a transform based on those conditions.
>>
>> Yes. (But I don't understand why someone would want this transformation.)
>
> My assumption. I may be wrong.
> You have many files marked up using schema A. (or simply a tagset A)
> You want to transform these files to use a more recent LT tagset.
>
> If we can share an understanding of the tagset, and how to get from one
> to the other, I can help automate it.
>

No, Mike does not want to transform or retag his files. He's using a 
specialized tagset, and that's fine. I simply want to steal some of his 
disambiguation rules, but for that, I'll have to use my brain instead of 
my Ctrl+C/Ctrl+V ;)

Best,
Marcin

>
>
>
>>
>>> E.g. (guessing)
>>>    input <STE_VERB_LEXICAL_BASE> -> <VB>
>>>
>>> input <do>   -> <VB>
>>> Although that sounds too simple?
>>
>> In principle, yes. But the mappings are much more complex. Also, there are
>> verbs that LT does not 'know' as verbs, such as the approved verb 'safety'.
>> And there is the not-approved verb 'safety-clip', for which there is no LT
>> postag (except for what it finds with the chunker
>> [http://wiki.languagetool.org/using-chunks]).
>
> No problem. For 'unknowns' I will mark the items as <unknown original="xxx">
> where xxx is the source markup.
>
>>
>>> then maps to ... Again I do not understand the English explanation,
>>> perhaps an XML example?
>>> "following terms" - are these XML children (nested within the parent)
>>> or siblings?
>>
>> Sorry, I don't know how to give an XML example. There is no formal XML
>> specification for the STE postags. I used the method that is in 'Adding only
>> POS tags or tokens'
>> (http://wiki.languagetool.org/developing-a-disambiguator#toc8).
>
> The link points to XML? If that is not available, then XSLT will
> not help?
>
> regards
>
> (Oh the joys of miscommunication :-)
>
> Dave P
>
>
>
>
>>
>> -----Original Message-----
>> From: Dave Pawson [mailto:dave.paw...@gmail.com]
>> Sent: 07 April 2014 12:55
>> To: development discussion for LanguageTool
>> Subject: Re: External rule files
>>
>> On 7 April 2014 11:08, Mike Unwalla <m...@techscribe.co.uk> wrote:
>>> Thanks Dave.
>>>
>>>   I am not an XML expert. I understand the phrase 'define a transform' to
>>> mean 'specify a mapping'. If my understanding is not correct, please tell
>>> me.
>>
>> That's right.
>> As a trial, if you give me a few examples,
>> and how you want these in the output, we can start from there.
>>
>>
>>>
>>> There is not a 1:1 mapping between the term checker postags and the LT
>>> postags. Thus, I cannot define a transform for all the postags, but I can
>>> define a transform for some of them. However, there are possible problems
>> as
>>> the examples below show.
>>
>> I need the XML source markup (is the source XML?)
>>    XSLT works on XML in and XML out.
>>
>>
>>>
>>> Example 1. Ignoring technical verbs that LT does not 'know', a verb that
>> has
>>> the postag STE_VERB_LEXICAL_BASE usually has the LT postag VB. However,
>>> although the verb 'do' has the LT postag VB, it does not have the postag
>>> STE_VERB_LEXICAL_BASE. (It has the postags STE_VERB_AUXILIARY_DO and
>>> STE_VERB_AUXILIARY_CAN_DO_MUST_WILL.) Thus, without excluding 'do' from a
>>> rule, you cannot map STE_VERB_LEXICAL_BASE to VB.
>>
>> I'm not sure I understand this... If you can express the conditions, then I
>> can
>> write a transform based on those conditions.
>> E.g. (guessing)
>>    input <STE_VERB_LEXICAL_BASE> -> <VB>
>>
>> input <do>   -> <VB>
>>   Although that sounds too simple?
>>
>>
>>
>>
>>>
>>> Example 2. With an approved 2-word plural noun, the first word has the
>>> postag STE_TN_NOUN_MULTI_WORD_PLURAL_1 and the second word has the postag
>>> STE_TN_NOUN_MULTI_WORD_PLURAL_2. (TN is an abbreviation of 'Technical
>> Name',
>>> which is a term from the STE specification.) The 3 terms that follow are
>>> approved 2-word nouns. The LT postags that relate to nouns are different
>> for
>>> the first word. The LT postags for nouns are in brackets:
>>> circuit breakers (NN, NNS)
>>> duty cycles (NN:UN, NNS)
>>> operating systems (-, NNS)
>>
>> <STE_TN_NOUN_MULTI_WORD_PLURAL_1> + <STE_TN_NOUN_MULTI_WORD_PLURAL_2>
>> (written as
>> <xsl:template
>> match="STE_TN_NOUN_MULTI_WORD_PLURAL_1[following-sibling::STE_TN_NOUN_MULTI_
>> WORD_PLURAL_2[1]]
>> ">
>>
>> then maps to ... Again I do not understand the English explanation,
>> perhaps an XML example?
>> "following terms" - are these XML children (nested within the parent)
>> or siblings?
>> <p>
>>    <child/>
>> </p>
>> <sibling/>
>>
>>
>>
>> regards
>>
>>
>>
>>
>>
>> --
>> Dave Pawson
>> XSLT XSL-FO FAQ.
>> Docbook FAQ.
>> http://www.dpawson.co.uk
>>
>> ----------------------------------------------------------------------------
>> --
>> Put Bad Developers to Shame
>> Dominate Development with Jenkins Continuous Integration
>> Continuously Automate Build, Test & Deployment
>> Start a new project now. Try Jenkins in the cloud.
>> http://p.sf.net/sfu/13600_Cloudbees_APR
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>>
>>
>> ------------------------------------------------------------------------------
>> Put Bad Developers to Shame
>> Dominate Development with Jenkins Continuous Integration
>> Continuously Automate Build, Test & Deployment
>> Start a new project now. Try Jenkins in the cloud.
>> http://p.sf.net/sfu/13600_Cloudbees_APR
>> _______________________________________________
>> Languagetool-devel mailing list
>> Languagetool-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/languagetool-devel
>
>
>


------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees_APR
_______________________________________________
Languagetool-devel mailing list
Languagetool-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to