Kevin Brubeck Unhammer <[email protected]> writes: > Hi, > > I notice that soft/hidden hyphens (­) can split words, e.g. in > > Jespersen > > there's a soft hyphen between n and t, but it should be analysed as one
Wops, between r and s! > word. I've noticed this a lot in web pages, I guess a lot of news sites > and such use programs that hyphenate using that character. > > The problem is, if we don't have the soft hyphen in <alphabet>, we get > two lexical units; if we have it there, we get one unknown word, even if > "Jespersen" is in the dix. > > Is it possible to use ACX files[1] or something to say that any soft hyphen > can be skipped? It seems sort of similar to what ACX does at least … > > > [1] http://wiki.apertium.org/wiki/Acx > > > -Kevin > > ------------------------------------------------------------------------------ Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
