Hi again, "This left-to-right, longest-match way of functioning makes it very easy to treat (variable or invariable) multi-word units (MWUs), for input: if a MWU is not complete, the acceptance state reached will correspond to a smaller unit, which will be clipped and whose transduction will be output (for example, if the dictionary contains \George" and the MWU \George Washington", when reading \George W. Bush" the MWU \George Washington" will abort at the \.", the transduction of \George" will be output and the analyser will be ready to process the remaining text, \ W. Bush")."
In that case, how does it work with abbreviations? The transducer tries the MWU \t.ex. ? It knows not to split on punctuation marks, if they can be found in a MWU? If so, I might name the tags for the abbreviations with what ever I find suitable, couldn't I? And multi word expressions with other punctuation marks would work as well, I presume. Yours, Per Tunedal On Tue, Nov 13, 2012, at 9:57, Francis Tyers wrote: > El dt 13 de 11 de 2012 a les 09:31 +0100, en/na Per Tunedal va escriure: --snip-- > > Secondly, I'm curious how Apertium handles word splitting. The points in > > abbreviations must be handled somehow, wouldn't they? I just thought > > about simple scripts for aligning, like Bligner, or even OmegaT. They > > split sentences at punctuation marks. Thus, they have a list of what not > > to split, i.e. the abbreviations for the languages in concern. That's > > why I started this tread. How does Apertium know not to split? Does the > > tagger look for the tag <abbr> ? Is this a standard solution for > > Apertium? Or do I have to add it in each language pair somehow? > > Left-to-right longest match with tokenise-as-you-analyse. > > http://www.dlsi.ua.es/~mlf/docum/garrido02p.pdf > > Section 3 describes it. > > Fran > > > ------------------------------------------------------------------------------ > Monitor your physical, virtual and cloud infrastructure from a single > web console. Get in-depth insight into apps, servers, databases, vmware, > SAP, cloud infrastructure, etc. Download 30-day Free Trial. > Pricing starts from $795 for 25 servers or applications! > http://p.sf.net/sfu/zoho_dev2dev_nov > _______________________________________________ > Apertium-stuff mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/apertium-stuff ------------------------------------------------------------------------------ Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
