Re: cannot reindex virtual index

Tibor Simko Sun, 31 Aug 2014 16:24:02 -0700

On Tue, 06 May 2014, Petr Brož wrote:
>>  https://github.com/inveniosoftware/invenio/issues/425
>
> Our change to strip_accents was a bit more opportunistic. We have just
> added some more accented letters to the repertoire of regexps used
> there and also added unicode normalization as the initial step there.
>
>> I did not popose a patch because I don't know how to implement the
>> tests.
>
> Me either :(


On this accent stripping topic, I have an almost finished branch that
should take care of ASCII'fication of Czech and many other languages
properly out of the box.  The only exceptions may be the CJK family of
languages and Greek, for which opinions differ:

   https://github.com/inveniosoftware/invenio/issues/1675

Here is an example:

 In [1]: x = "Všichni lidé se rodí svobodní a sobě rovní " \
             "co do důstojnosti a práv."
   
 In [2]: from invenio.textutils import strip_accents
   
 In [3]: strip_accents(x)
 'Vsichni lide se rodi svobodni a sobe rovni co do dustojnosti a prav.'
   
Best regards
-- 
Tibor Simko

Re: cannot reindex virtual index

Reply via email to