W dniu 2014-04-07 15:58, Dave Pawson pisze: > On 7 April 2014 14:43, Mike Unwalla <m...@techscribe.co.uk> wrote: >>> and how you want these in the output, we can start from there. >> >> I think that we have a miscommunication. I don't need a mapping from the STE >> postags to the LT postags. I created the STE postags for the term checker >> because I can't do what I want to do with only the LT postags. > > Yes I think we do have an difference of understanding. > >> >>> I need the XML source markup (is the source XML?) >> >> The source is XML. It is available from >> www.simplified-english.co.uk/installation.html in the file >> term-checker-evaluation-yyyy-mm-dd.zip (I do not give the current file name >> in this e-mail because the .zip file name contains a date, and I put only >> the most recent version of the file on the website.) >> >> But, if 'source markup' means a marked up document in which terms are >> annotated with a postag, then no, I do not have source markup. > > No, I was thinking of the valid syntax of your form to that which is required? > Either a schema or DTD. > Examples of marked up text would suffice, just take longer? > > >> >>> I'm not sure I understand this... If you can express the conditions, then >> I can >>> write a transform based on those conditions. >> >> Yes. (But I don't understand why someone would want this transformation.) > > My assumption. I may be wrong. > You have many files marked up using schema A. (or simply a tagset A) > You want to transform these files to use a more recent LT tagset. > > If we can share an understanding of the tagset, and how to get from one > to the other, I can help automate it. >
No, Mike does not want to transform or retag his files. He's using a specialized tagset, and that's fine. I simply want to steal some of his disambiguation rules, but for that, I'll have to use my brain instead of my Ctrl+C/Ctrl+V ;) Best, Marcin > > > >> >>> E.g. (guessing) >>> input <STE_VERB_LEXICAL_BASE> -> <VB> >>> >>> input <do> -> <VB> >>> Although that sounds too simple? >> >> In principle, yes. But the mappings are much more complex. Also, there are >> verbs that LT does not 'know' as verbs, such as the approved verb 'safety'. >> And there is the not-approved verb 'safety-clip', for which there is no LT >> postag (except for what it finds with the chunker >> [http://wiki.languagetool.org/using-chunks]). > > No problem. For 'unknowns' I will mark the items as <unknown original="xxx"> > where xxx is the source markup. > >> >>> then maps to ... Again I do not understand the English explanation, >>> perhaps an XML example? >>> "following terms" - are these XML children (nested within the parent) >>> or siblings? >> >> Sorry, I don't know how to give an XML example. There is no formal XML >> specification for the STE postags. I used the method that is in 'Adding only >> POS tags or tokens' >> (http://wiki.languagetool.org/developing-a-disambiguator#toc8). > > The link points to XML? If that is not available, then XSLT will > not help? > > regards > > (Oh the joys of miscommunication :-) > > Dave P > > > > >> >> -----Original Message----- >> From: Dave Pawson [mailto:dave.paw...@gmail.com] >> Sent: 07 April 2014 12:55 >> To: development discussion for LanguageTool >> Subject: Re: External rule files >> >> On 7 April 2014 11:08, Mike Unwalla <m...@techscribe.co.uk> wrote: >>> Thanks Dave. >>> >>> I am not an XML expert. I understand the phrase 'define a transform' to >>> mean 'specify a mapping'. If my understanding is not correct, please tell >>> me. >> >> That's right. >> As a trial, if you give me a few examples, >> and how you want these in the output, we can start from there. >> >> >>> >>> There is not a 1:1 mapping between the term checker postags and the LT >>> postags. Thus, I cannot define a transform for all the postags, but I can >>> define a transform for some of them. However, there are possible problems >> as >>> the examples below show. >> >> I need the XML source markup (is the source XML?) >> XSLT works on XML in and XML out. >> >> >>> >>> Example 1. Ignoring technical verbs that LT does not 'know', a verb that >> has >>> the postag STE_VERB_LEXICAL_BASE usually has the LT postag VB. However, >>> although the verb 'do' has the LT postag VB, it does not have the postag >>> STE_VERB_LEXICAL_BASE. (It has the postags STE_VERB_AUXILIARY_DO and >>> STE_VERB_AUXILIARY_CAN_DO_MUST_WILL.) Thus, without excluding 'do' from a >>> rule, you cannot map STE_VERB_LEXICAL_BASE to VB. >> >> I'm not sure I understand this... If you can express the conditions, then I >> can >> write a transform based on those conditions. >> E.g. (guessing) >> input <STE_VERB_LEXICAL_BASE> -> <VB> >> >> input <do> -> <VB> >> Although that sounds too simple? >> >> >> >> >>> >>> Example 2. With an approved 2-word plural noun, the first word has the >>> postag STE_TN_NOUN_MULTI_WORD_PLURAL_1 and the second word has the postag >>> STE_TN_NOUN_MULTI_WORD_PLURAL_2. (TN is an abbreviation of 'Technical >> Name', >>> which is a term from the STE specification.) The 3 terms that follow are >>> approved 2-word nouns. The LT postags that relate to nouns are different >> for >>> the first word. The LT postags for nouns are in brackets: >>> circuit breakers (NN, NNS) >>> duty cycles (NN:UN, NNS) >>> operating systems (-, NNS) >> >> <STE_TN_NOUN_MULTI_WORD_PLURAL_1> + <STE_TN_NOUN_MULTI_WORD_PLURAL_2> >> (written as >> <xsl:template >> match="STE_TN_NOUN_MULTI_WORD_PLURAL_1[following-sibling::STE_TN_NOUN_MULTI_ >> WORD_PLURAL_2[1]] >> "> >> >> then maps to ... Again I do not understand the English explanation, >> perhaps an XML example? >> "following terms" - are these XML children (nested within the parent) >> or siblings? >> <p> >> <child/> >> </p> >> <sibling/> >> >> >> >> regards >> >> >> >> >> >> -- >> Dave Pawson >> XSLT XSL-FO FAQ. >> Docbook FAQ. >> http://www.dpawson.co.uk >> >> ---------------------------------------------------------------------------- >> -- >> Put Bad Developers to Shame >> Dominate Development with Jenkins Continuous Integration >> Continuously Automate Build, Test & Deployment >> Start a new project now. Try Jenkins in the cloud. >> http://p.sf.net/sfu/13600_Cloudbees_APR >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel >> >> >> ------------------------------------------------------------------------------ >> Put Bad Developers to Shame >> Dominate Development with Jenkins Continuous Integration >> Continuously Automate Build, Test & Deployment >> Start a new project now. Try Jenkins in the cloud. >> http://p.sf.net/sfu/13600_Cloudbees_APR >> _______________________________________________ >> Languagetool-devel mailing list >> Languagetool-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > > ------------------------------------------------------------------------------ Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test & Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees_APR _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel