On Mon, 01 Sep 2014, Ferran Jorba wrote:
> May I know how does it handle Catalan middle dot character, used between
> two l (that is, «l·l»), for words like «paral·lel» or «cal·ligrafia»?
It is not doing much currently:
In [2]: strip_accents('cal·ligrafia')
'cal\xc2\xb7ligrafia'
In [3]: strip_accents('col•lectiu')
'col\xe2\x80\xa2lectiu'
i.e. basically keeping it in.
Perhaps the middle dot can be always stripped away before
indexing/searching/comparing terms? Are there cases like "foo·bar·baz"
and "foo·barbaz" and "foobar·baz" meaning three different things, yet
having the same dotless ASCII transliteration? If not, perhaps one
could always treat 'l·l' as 'll'?
Best regards
--
Tibor Simko