On 7 April 2014 14:43, Mike Unwalla <m...@techscribe.co.uk> wrote: >> and how you want these in the output, we can start from there. > > I think that we have a miscommunication. I don't need a mapping from the STE > postags to the LT postags. I created the STE postags for the term checker > because I can't do what I want to do with only the LT postags.
Yes I think we do have an difference of understanding. > >> I need the XML source markup (is the source XML?) > > The source is XML. It is available from > www.simplified-english.co.uk/installation.html in the file > term-checker-evaluation-yyyy-mm-dd.zip (I do not give the current file name > in this e-mail because the .zip file name contains a date, and I put only > the most recent version of the file on the website.) > > But, if 'source markup' means a marked up document in which terms are > annotated with a postag, then no, I do not have source markup. No, I was thinking of the valid syntax of your form to that which is required? Either a schema or DTD. Examples of marked up text would suffice, just take longer? > >> I'm not sure I understand this... If you can express the conditions, then > I can >> write a transform based on those conditions. > > Yes. (But I don't understand why someone would want this transformation.) My assumption. I may be wrong. You have many files marked up using schema A. (or simply a tagset A) You want to transform these files to use a more recent LT tagset. If we can share an understanding of the tagset, and how to get from one to the other, I can help automate it. > >> E.g. (guessing) >> input <STE_VERB_LEXICAL_BASE> -> <VB> >> >> input <do> -> <VB> >> Although that sounds too simple? > > In principle, yes. But the mappings are much more complex. Also, there are > verbs that LT does not 'know' as verbs, such as the approved verb 'safety'. > And there is the not-approved verb 'safety-clip', for which there is no LT > postag (except for what it finds with the chunker > [http://wiki.languagetool.org/using-chunks]). No problem. For 'unknowns' I will mark the items as <unknown original="xxx"> where xxx is the source markup. > >> then maps to ... Again I do not understand the English explanation, >> perhaps an XML example? >> "following terms" - are these XML children (nested within the parent) >> or siblings? > > Sorry, I don't know how to give an XML example. There is no formal XML > specification for the STE postags. I used the method that is in 'Adding only > POS tags or tokens' > (http://wiki.languagetool.org/developing-a-disambiguator#toc8). The link points to XML? If that is not available, then XSLT will not help? regards (Oh the joys of miscommunication :-) Dave P > > -----Original Message----- > From: Dave Pawson [mailto:dave.paw...@gmail.com] > Sent: 07 April 2014 12:55 > To: development discussion for LanguageTool > Subject: Re: External rule files > > On 7 April 2014 11:08, Mike Unwalla <m...@techscribe.co.uk> wrote: >> Thanks Dave. >> >> I am not an XML expert. I understand the phrase 'define a transform' to >> mean 'specify a mapping'. If my understanding is not correct, please tell >> me. > > That's right. > As a trial, if you give me a few examples, > and how you want these in the output, we can start from there. > > >> >> There is not a 1:1 mapping between the term checker postags and the LT >> postags. Thus, I cannot define a transform for all the postags, but I can >> define a transform for some of them. However, there are possible problems > as >> the examples below show. > > I need the XML source markup (is the source XML?) > XSLT works on XML in and XML out. > > >> >> Example 1. Ignoring technical verbs that LT does not 'know', a verb that > has >> the postag STE_VERB_LEXICAL_BASE usually has the LT postag VB. However, >> although the verb 'do' has the LT postag VB, it does not have the postag >> STE_VERB_LEXICAL_BASE. (It has the postags STE_VERB_AUXILIARY_DO and >> STE_VERB_AUXILIARY_CAN_DO_MUST_WILL.) Thus, without excluding 'do' from a >> rule, you cannot map STE_VERB_LEXICAL_BASE to VB. > > I'm not sure I understand this... If you can express the conditions, then I > can > write a transform based on those conditions. > E.g. (guessing) > input <STE_VERB_LEXICAL_BASE> -> <VB> > > input <do> -> <VB> > Although that sounds too simple? > > > > >> >> Example 2. With an approved 2-word plural noun, the first word has the >> postag STE_TN_NOUN_MULTI_WORD_PLURAL_1 and the second word has the postag >> STE_TN_NOUN_MULTI_WORD_PLURAL_2. (TN is an abbreviation of 'Technical > Name', >> which is a term from the STE specification.) The 3 terms that follow are >> approved 2-word nouns. The LT postags that relate to nouns are different > for >> the first word. The LT postags for nouns are in brackets: >> circuit breakers (NN, NNS) >> duty cycles (NN:UN, NNS) >> operating systems (-, NNS) > > <STE_TN_NOUN_MULTI_WORD_PLURAL_1> + <STE_TN_NOUN_MULTI_WORD_PLURAL_2> > (written as > <xsl:template > match="STE_TN_NOUN_MULTI_WORD_PLURAL_1[following-sibling::STE_TN_NOUN_MULTI_ > WORD_PLURAL_2[1]] > "> > > then maps to ... Again I do not understand the English explanation, > perhaps an XML example? > "following terms" - are these XML children (nested within the parent) > or siblings? > <p> > <child/> > </p> > <sibling/> > > > > regards > > > > > > -- > Dave Pawson > XSLT XSL-FO FAQ. > Docbook FAQ. > http://www.dpawson.co.uk > > ---------------------------------------------------------------------------- > -- > Put Bad Developers to Shame > Dominate Development with Jenkins Continuous Integration > Continuously Automate Build, Test & Deployment > Start a new project now. Try Jenkins in the cloud. > http://p.sf.net/sfu/13600_Cloudbees_APR > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > > > ------------------------------------------------------------------------------ > Put Bad Developers to Shame > Dominate Development with Jenkins Continuous Integration > Continuously Automate Build, Test & Deployment > Start a new project now. Try Jenkins in the cloud. > http://p.sf.net/sfu/13600_Cloudbees_APR > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. http://www.dpawson.co.uk ------------------------------------------------------------------------------ Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test & Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees_APR _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel