Someone once developed an algorithm called *KUCut* to insert zero width spaces into Thai text.
Not sure of the current state of play, but I do know that the text used as the test bed for machine learning was the *ThaiKJV* of Philip Pope, which was the source text for our module. An unrelated discussion on the same subject gives the flavour. https://stackoverflow.com/questions/8492763/thai-line-breaking-how-to-break-thai-text-effectively It was Michael Hart that alerted me to this back in 2012 or earlier. WBTC (as was) even used *KUCut* to add ZWSP to their *ThaiERV* translation to improve word-wrapping. *KUCut* was described here http://veer66.wordpress.com/2009/11/23/kucutwindows/ Back in 2012, the Python source code was maintained here https://bitbucket.org/veer66/kucut And there's an online demo (probably the same source) here: http://www.thai-language.com/?nav=zwsp There's now a *KUCut* repository on GitHub. Thai isn't unique, either. See https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries But we won't go there ..... yet! Tag this for "something to do on a rainy day". Blessings, David PS. Not checked to see if any of the above links are broken. -- View this message in context: http://sword-dev.350566.n4.nabble.com/Soft-hyphens-tp4657045p4657050.html Sent from the SWORD Dev mailing list archive at Nabble.com. _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
