Andre Couture wrote: > Hi > I did not follow the entire conversation here but I was curious as of why > would someone put a non breaking space between two words? > We face that in other areas of our code as well. > > If the idea of the nbsp is to keep the two apparent words together, would > it be good to handle the nbsp as an hyphen? Which mean that the two > words could be treated as two words or a single one??
No. The nbsp is to avoid a newline where it would not be suitable. A good example is when you write "80 kg". It would be ugly if 80 is at the end of the line and kg at the beginning of the next line. So using a nbsp between numbers and units is useful. But for LanguageTool, this is irrelevant (i.e. it should be like a space). There are other good examples of nbsp in English here: http://english.stackexchange.com/questions/28467/when-is-it-appropriate-to-use-non-breaking-spaces There are plenty of space characters in Unicode. See: https://en.wikipedia.org/wiki/Whitespace_character The tokenization should ideally treat them equal I think but I have not checked. In French at least, handling "U+202F NARROW NO-BREAK SPACE" correctly (i.e. as a space for LT) would be useful. This is the recommended space to use in front of punctuation ? ! ; : I know, English and other language don't put a space before those punctuation characters, but French does, it would be ugly if the punctuation character was on the next line. https://fr.wikipedia.org/wiki/Espace_fine_ins%C3%A9cable Regards Dominique ------------------------------------------------------------------------------ _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel