Oh yes, I also use Spring Cache which works fine and I don't have to store
products in Lucene making index smaller and faster.

On Fri, 23 Sept 2022, 19:26 Stephane Passignat, <passig...@hotmail.com>
wrote:

> Hi
>
> I would don't store the original value. That's "just" an index. But store
> the value of your db identifiers, because I think you'll want it at some
> point. (I made the same kind of feature on top of datanucleus)
>
> I use to have tech id in my db. Even more since I started to use jdo jpa
> some 20 years ago.
>
> With Lucerne I would also suggest to store a pretty view on entities. This
> allows to have the ready to display info without querying the db.
> As you won't be able to index a full big database, think about the restart
> if the indexer. Having numeric Id and last update field helped me.
>
> Had you thought about numbers?
>
>
>
> Télécharger BlueMail pour Android<https://bluemail.me>
> Le 23 sept. 2022, à 09:30, "Hrvoje Lončar" <horv...@gmail.com<mailto:
> horv...@gmail.com>> a écrit:
> Hi Stephane!
>
> Actually, I have excactly that kind of conversion, but I didn't mention as
> my mail was long enough whithout it :)
> My main concern it should I let Lucene index original keywords or not.
> Considering what you wrote, I guess your answer would be to store only
> converted values without exotic characters.
>
> Thanks a lot for your reply!
>
> BR,
> Hrvoje
>
> On Thu, Sep 22, 2022 at 7:53 PM Stephane Passignat < passig...@hotmail.com
> <mailto:passig...@hotmail.com>> wrote:
> Hello,
>
> The way I did it took me some time and I almost sure it's applicable to
> all languages.
>
> I normalized the words. Replacing letters or group of letters by another
> approaching one.
>
> In french e é è ê ai ei sound a bit the same, and for someone who write
> mistakes having to use the right letters is very frustrating. So I
> transformed all of them into e...
>
> Hope it helps
>
> Télécharger BlueMail pour Android< https://bluemail.me>
> Le 22 sept. 2022, à 16:37, "Hrvoje Lončar" < horv...@gmail.com<mailto:
> horv...@gmail.com><mailto: horv...@gmail.com<mailto:horv...@gmail.com>>>
> a écrit:
>
> Hi!
>
> I'm using Hibernate Search / Lucene to index my entities in Spring Boot
> aplication.
>
> One thing I'm not sure is how to handle Croatian specific letters.
> Croatian language has few additional letters "*č* *Č* *ć* *Ć* *đ* *Đ* *š*
> *Š* *ž* *Ž*".
> Letters "*đ* *Đ*" are commonly replaced with "*dj* *DJ*" when no Croatian
> letters available.
>
> In my custom Hibernate bridge there is a step that replaces all Croatian
> characters with appropriate ASCII replacements which means "*č*" becomes "
> *c*", "*š*" becomes "*s*" and so on.
> Later, when user enters search text, the same process is done to match
> values from index.
> There is one more good thing about it - some older users that used
> computers in early ages when no Croatian letters were available - those
> users type words without Croatian letters, automatically replacing "*č*"
> with
> "*c*" and that fits my logic to get good search results.
>
> For example, the title of my entity is: "*juha s češnjakom u đumbirom*".
> My custom Hibernate String bridge converts it to "*juha cesnjakom dumbirom*
> ".
> Then user enters "*juha s češnjakom*".
> Before issuing a search, the same conversion is made to users' query and
> text sent to Lucene is "*juha cesnjakom*".
> This is the way how I implemented it and it's working fine.
>
> The other way would be to index original text and then find words with
> Croatian characters, convert them to ASCII and add to original.
> The title "*juha s češnjakom i đumbirom*" would become "*juha češnjakom
> đumbirom cesnjakom dumbirom*".
> In that case there is no need to convert users' search terms because
> both "*juha
> s češnjakom*" and "*juha s cesnjakom*" would return the same result.
>
> My question is:
> Is there any reason to switch to this alternative logic and have original
> keywords indexed in parallel with those converted to ASCII?
>
> Thanks!
>
> BR,
> Hrvoje
>
>
> --
> {{  Horvoje.net<https://horvoje.net/> ~~  VegCook.net<https://vegcook.net/>
>  ~~   TheVegCat.com<https://thevegcat.com:9999/> ~~  Cuspajz.com<
> https://cuspajz.com/> ~~ VintageZagreb.net<https://vintagezagreb.net/>
> ~~  Sterilizacija.org<https://sterilizacija.org/>  ~~   SmijSe.com<
> https://smijse.com/> ~~  HTMLutil.net<https://htmlutil.net/> ~~
> HTTPinfo.net<https://httpinfo.net/> }}
>

Reply via email to