Oh yes, I also use Spring Cache which works fine and I don't have to store products in Lucene making index smaller and faster.
On Fri, 23 Sept 2022, 19:26 Stephane Passignat, <passig...@hotmail.com> wrote: > Hi > > I would don't store the original value. That's "just" an index. But store > the value of your db identifiers, because I think you'll want it at some > point. (I made the same kind of feature on top of datanucleus) > > I use to have tech id in my db. Even more since I started to use jdo jpa > some 20 years ago. > > With Lucerne I would also suggest to store a pretty view on entities. This > allows to have the ready to display info without querying the db. > As you won't be able to index a full big database, think about the restart > if the indexer. Having numeric Id and last update field helped me. > > Had you thought about numbers? > > > > Télécharger BlueMail pour Android<https://bluemail.me> > Le 23 sept. 2022, à 09:30, "Hrvoje Lončar" <horv...@gmail.com<mailto: > horv...@gmail.com>> a écrit: > Hi Stephane! > > Actually, I have excactly that kind of conversion, but I didn't mention as > my mail was long enough whithout it :) > My main concern it should I let Lucene index original keywords or not. > Considering what you wrote, I guess your answer would be to store only > converted values without exotic characters. > > Thanks a lot for your reply! > > BR, > Hrvoje > > On Thu, Sep 22, 2022 at 7:53 PM Stephane Passignat < passig...@hotmail.com > <mailto:passig...@hotmail.com>> wrote: > Hello, > > The way I did it took me some time and I almost sure it's applicable to > all languages. > > I normalized the words. Replacing letters or group of letters by another > approaching one. > > In french e é è ê ai ei sound a bit the same, and for someone who write > mistakes having to use the right letters is very frustrating. So I > transformed all of them into e... > > Hope it helps > > Télécharger BlueMail pour Android< https://bluemail.me> > Le 22 sept. 2022, à 16:37, "Hrvoje Lončar" < horv...@gmail.com<mailto: > horv...@gmail.com><mailto: horv...@gmail.com<mailto:horv...@gmail.com>>> > a écrit: > > Hi! > > I'm using Hibernate Search / Lucene to index my entities in Spring Boot > aplication. > > One thing I'm not sure is how to handle Croatian specific letters. > Croatian language has few additional letters "*č* *Č* *ć* *Ć* *đ* *Đ* *š* > *Š* *ž* *Ž*". > Letters "*đ* *Đ*" are commonly replaced with "*dj* *DJ*" when no Croatian > letters available. > > In my custom Hibernate bridge there is a step that replaces all Croatian > characters with appropriate ASCII replacements which means "*č*" becomes " > *c*", "*š*" becomes "*s*" and so on. > Later, when user enters search text, the same process is done to match > values from index. > There is one more good thing about it - some older users that used > computers in early ages when no Croatian letters were available - those > users type words without Croatian letters, automatically replacing "*č*" > with > "*c*" and that fits my logic to get good search results. > > For example, the title of my entity is: "*juha s češnjakom u đumbirom*". > My custom Hibernate String bridge converts it to "*juha cesnjakom dumbirom* > ". > Then user enters "*juha s češnjakom*". > Before issuing a search, the same conversion is made to users' query and > text sent to Lucene is "*juha cesnjakom*". > This is the way how I implemented it and it's working fine. > > The other way would be to index original text and then find words with > Croatian characters, convert them to ASCII and add to original. > The title "*juha s češnjakom i đumbirom*" would become "*juha češnjakom > đumbirom cesnjakom dumbirom*". > In that case there is no need to convert users' search terms because > both "*juha > s češnjakom*" and "*juha s cesnjakom*" would return the same result. > > My question is: > Is there any reason to switch to this alternative logic and have original > keywords indexed in parallel with those converted to ASCII? > > Thanks! > > BR, > Hrvoje > > > -- > {{ Horvoje.net<https://horvoje.net/> ~~ VegCook.net<https://vegcook.net/> > ~~ TheVegCat.com<https://thevegcat.com:9999/> ~~ Cuspajz.com< > https://cuspajz.com/> ~~ VintageZagreb.net<https://vintagezagreb.net/> > ~~ Sterilizacija.org<https://sterilizacija.org/> ~~ SmijSe.com< > https://smijse.com/> ~~ HTMLutil.net<https://htmlutil.net/> ~~ > HTTPinfo.net<https://httpinfo.net/> }} >