Dear Experts, Can somebody please help and guide me for the below queries? I have become bit clueless now, after giving a good number of different tries.
Regards Rajib -----Original Message----- From: Saha, Rajib Sent: 27 May 2025 11:52 To: java-user@lucene.apache.org Subject: RE: Suggestion needed for a case of Lucene Migration with TokenStream Hi Uwe, Thanks for your suggestions till now. We have been able to proceed good. We are now stuck to a point, where we need some your expert suggestion. As per our design, on full content indexing, - in first step, there will small Lucene index files gets created with 5-6 documents. We called it delta index files. - in second steps, we try to merge the delta index files to master Index File. Below is snippet of the code: ============================ IndexWriter masterIndexWriter = new IndexWriter(indexDir, config); FSDirectory[] deltaIndexDirs = new FSDirectory[deltaIndexDirList.size()]; int j = 0; for (Iterator<FSDirectory> i = deltaIndexDirList.iterator(); i.hasNext(); j++) { deltaIndexDirs[j] = i.next(); } masterIndexWriter.addIndexes(deltaIndexDirs); =========================== But on doing it, we are getting the below exception. I tried several things. But, could not come out of the problem. Do you suspect anything here? Can you please suggest something to come out of the problem? ============================================= CaughtException while Merging in LuceneIndexEngine cannot change field "boe.search.wild_description" from index options=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS java.lang.IllegalArgumentException: cannot change field "boe.search.wild_description" from index options=DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS to inconsistent index options=DOCS_AND_FREQS_AND_POSITIONS at org.apache.lucene.index.FieldInfos$FieldNumbers.addOrGet(FieldInfos.java:308) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2984) at com.sap.businessobjects.platform.search.lucene.index.engine.LuceneIndexEngine.merge(LuceneIndexEngine.java:981) ============================================= Regards Rajib -----Original Message----- From: Uwe Schindler <u...@thetaphi.de> Sent: 30 April 2025 02:03 To: java-user@lucene.apache.org Subject: Re: Suggestion needed for a case of Lucene Migration with TokenStream If this is Windows, the deletion may not work if there are still IndexReaders or Writers open by same or other processes. On Linux I have no idea, need an exception message. It should clearly say why it fails. Uwe Am 29.04.2025 um 13:44 schrieb Saha, Rajib: > Hi Uwe, > > In our product we have different level of indexing like MetaData/FullContent > information of the Reports. > So, Rebuild indexing deletes the existing Lucene index files and do a fresh > indexing of all the documents. > > When we do physically going to directory and delete the Lucene Index files. > The Rebuild indexing is working fine. > But, from UI of product when we are selecting for Rebuild indexing, Indexing > is not happening. > > I am debugging more for it. I will update you further on getting better > picture. As our code for the Area goes with multiple tasks and thread. It is > taking time to debug. > > I am suspecting, there may be some lock is there in Lucene Index files, due > to of it, delete of Lucene index files are not working with stopping the > service. But, this is a guess. Investigation is on for it. > Do you have any suspect? > > Regards > Rajib > > -----Original Message----- > From: Uwe Schindler <u...@thetaphi.de> > Sent: 28 April 2025 17:59 > To: java-user@lucene.apache.org > Subject: Re: Suggestion needed for a case of Lucene Migration with TokenStream > > Hi, > > what do you mean with: "But same content on rebuilding the index is not > working"? > > How do you rebuild the index? It is not enough to just read all > documents as stored fields and reindex them. You need the original > document data and basically run them thorugh the same pipeline that you > already have (so the indexing should be done by the same code that > indexes new documents). So I'd write some code that reads the old data > (if possible from source) or reads the old index (if all information > that was indexed is available as stored fields, synthetically builds > input data for the new indexer and sends it to the API (or whatever you > have for indexing in your new system). > > If you just have incomplete Lucene Document instances from the older > Lucene index, I think you're lost. When you cann > IndexReader/IndexSearcher.document(), you only get stored fields, -- but > that's not all information that was originbally used for indexing. > Reading documents from IndexReader and passing it to IndexWriter does > not work. It works from the API point of view, but the data is different. > > Uwe > > Am 28.04.2025 um 12:43 schrieb Saha, Rajib: >> Hi Uwe, >> >> Thank you for your detailed input and valuable advice. I fully understand >> and agree that upgrading from such an old version of Lucene involves much >> more than just resolving compilation issues. >> Based on the latest Lucene version, we have redesigned our platform >> accordingly going through the Lucene APIs used and replacing accordingly to >> latest. >> >> With these changes, Fresh content indexing is working fine. Search results >> are also coming as expected. >> Greatly appreciate your expert guidance, to help to bringing till this point. >> >> But same content on rebuilding the index is not working. >> I am debugging this part now. >> >> Do you have any suggestion on the problem ? >> >> Regards >> Rajib >> >> -----Original Message----- >> From: Uwe Schindler <u...@thetaphi.de> >> Sent: 25 April 2025 18:19 >> To: java-user@lucene.apache.org >> Subject: Re: Suggestion needed for a case of Lucene Migration with >> TokenStream >> >> Hi, >> >> I'd like to mention the following: You are trying to upgrade Lucene from >> a really ancient version. Of course, basic concepts are still the same, >> but the serach engine and its APIs have changed dramatically, so just >> trying to "compile code and fix random stuff until it compiles" will not >> bring you to a working product. On top, it may make the product worse >> than before the update. >> >> To do the upgrade correctly, it is recommended to have somebody >> available (ideally the person who wrote the code originally) and then go >> though it line-by line and rewrite it. I am explicitely mentioning >> "rewrite" because that's what you should do! If you don't have a person >> that undertstands Lucene enough, I'd suggest to get help from outside. >> You need to understand every line of code when rewriting it. In addition >> there are many new features that make all that sepcial cases like >> PayLoads on Tokenstreams obsolete. I'd not recommend to use something >> payloads on terms nowadays. >> >> Uwe >> >> Am 24.04.2025 um 12:29 schrieb Mikhail Khludnev: >>> Right. TextField.TYPE_NOT_STORED should be used then. >>> >>> On Thu, Apr 24, 2025 at 10:37 AM Saha, Rajib <rajib.s...@sap.com.invalid> >>> wrote: >>> >>>> Thanks Mikhail for the suggestion. >>>> Now the previous exception has gone. But a new exception has come from >>>> Field.java. >>>> Here below are the exception details. >>>> ======== >>>> java.lang.IllegalArgumentException: TokenStream fields cannot be stored >>>> at org.apache.lucene.document.Field.<init>(Field.java:155) >>>> ========= >>>> >>>> Can you please suggest here too? >>>> >>>> Regards >>>> Rajib >>>> >>>> >>>> -----Original Message----- >>>> From: Mikhail Khludnev <m...@apache.org> >>>> Sent: 24 April 2025 12:10 >>>> To: java-user@lucene.apache.org >>>> Subject: Re: Suggestion needed for a case of Lucene Migration with >>>> TokenStream >>>> >>>> Hi >>>> Use TextField.TYPE_STORED as the third argument in new Field() >>>> see >>>> >>>> https://github.com/apache/lucene-solr/blob/e27f44e3d78dfcec230c97e0a1240e3751daeff9/lucene/core/src/java/org/apache/lucene/document/TextField.java#L35C33-L35C44 >>>> >>>> >>>> On Thu, Apr 24, 2025 at 8:37 AM Saha, Rajib <rajib.s...@sap.com.invalid> >>>> wrote: >>>> >>>>> Hi Experts, >>>>> >>>>> We are migrating Lucene from 2.4.1 to 8.11.2. >>>>> >>>>> During Migration for a part of code, we are getting below exception in >>>>> 8.11.2 based changes from Red line colored. >>>>> ============= >>>>> java.lang.IllegalArgumentException: TokenStream fields must be indexed >>>> and >>>>> tokenized >>>>> at org.apache.lucene.document.Field.<init>(Field.java:152) >>>>> >>>>> I tied few options. But, could not able to resolve the error. Beiiw >>>>> Can somebody of you please help me to identify, where it is going as >>>> wrong? >>>>> We had code based on 2.4.1 as like below: >>>>> =================================== >>>>> Int currentVal< >>>>> http://10.238.236.101:8080/source/s?defs=currentVal&project=2025_RTM> = >>>>> //some value >>>>> PayloadTokenStream< >>>>> >>>> http://10.238.236.101:8080/source/s?defs=PayloadTokenStream&project=2025_RTM >>>>> tokenStream< >>>>> http://10.238.236.101:8080/source/s?refs=tokenStream&project=2025_RTM> = >>>>> new PayloadTokenStream< >>>>> >>>> http://10.238.236.101:8080/source/s?defs=PayloadTokenStream&project=2025_RTM >>>>>> (); >>>>> tokenStream< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/serviceplugins/src/com/sap/businessobjects/platform/search/lucene/index/engine/LuceneIndexEngine.java#tokenStream >>>>>> .setPayload< >>>>> http://10.238.236.101:8080/source/s?defs=setPayload&project=2025_RTM >>>>>> (currentVal< >>>>> http://10.238.236.101:8080/source/s?defs=currentVal&project=2025_RTM>); >>>>> lucField< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/serviceplugins/src/com/sap/businessobjects/platform/search/lucene/index/engine/LuceneIndexEngine.java#lucField >>>>> = new Field< >>>>> http://10.238.236.101:8080/source/s?defs=Field&project=2025_RTM>(config< >>>>> http://10.238.236.101:8080/source/s?defs=config&project=2025_RTM >>>>>> .payloadUid< >>>>> http://10.238.236.101:8080/source/s?defs=payloadUid&project=2025_RTM >>>>>> ().name<http://10.238.236.101:8080/source/s?defs=name&project=2025_RTM >>>>> , >>>>> tokenStream< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/serviceplugins/src/com/sap/businessobjects/platform/search/lucene/index/engine/LuceneIndexEngine.java#tokenStream >>>>>> ); >>>>> doc<http://10.238.236.101:8080/source/s?defs=doc&project=2025_RTM>.add< >>>>> http://10.238.236.101:8080/source/s?defs=add&project=2025_RTM>(lucField< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/serviceplugins/src/com/sap/businessobjects/platform/search/lucene/index/engine/LuceneIndexEngine.java#lucField >>>>>> ); >>>>> ...... >>>>> public class PayloadTokenStream< >>>>> >>>> http://10.238.236.101:8080/source/s?refs=PayloadTokenStream&project=2025_RTM >>>>> extends TokenStream< >>>>> http://10.238.236.101:8080/source/s?defs=TokenStream&project=2025_RTM>{ >>>>> public static String< >>>>> http://10.238.236.101:8080/source/s?defs=String&project=2025_RTM> >>>>> UID_PAYLOAD_START_VAL< >>>>> >>>> http://10.238.236.101:8080/source/s?refs=UID_PAYLOAD_START_VAL&project=2025_RTM >>>>> = "_UID_"; >>>>> private Token< >>>>> http://10.238.236.101:8080/source/s?defs=Token&project=2025_RTM> token< >>>>> http://10.238.236.101:8080/source/s?refs=token&project=2025_RTM> = new >>>>> Token<http://10.238.236.101:8080/source/s?defs=Token&project=2025_RTM >>>>>> (UID_PAYLOAD_START_VAL< >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#UID_PAYLOAD_START_VAL >>>>>> ,0,0); >>>>> private byte[] buffer< >>>>> http://10.238.236.101:8080/source/s?refs=buffer&project=2025_RTM> = new >>>>> byte[4]; >>>>> private boolean returnToken< >>>>> http://10.238.236.101:8080/source/s?refs=returnToken&project=2025_RTM> = >>>>> false; >>>>> >>>>> public void setPayload< >>>>> http://10.238.236.101:8080/source/s?refs=setPayload&project=2025_RTM >>>>> (int >>>>> uid<http://10.238.236.101:8080/source/s?refs=uid&project=2025_RTM>){ >>>>> buffer< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#buffer >>>>> [0] >>>>> = (byte)uid< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#uid >>>>>> ; >>>>> buffer< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#buffer >>>>> [1] >>>>> = (byte)(uid< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#uid >>>>>>>> 8); >>>>> buffer< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#buffer >>>>> [2] >>>>> = (byte)(uid< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#uid >>>>>>>> 16); >>>>> buffer< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#buffer >>>>> [3] >>>>> = (byte)(uid< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#uid >>>>>>>> 24); >>>>> token< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#token >>>>>> .setPayload< >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#setPayload >>>>> (new >>>>> Payload< >>>> http://10.238.236.101:8080/source/s?defs=Payload&project=2025_RTM >>>>>> (buffer< >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#buffer >>>>>> )); >>>>> returnToken = true; >>>>> } >>>>> public Token< >>>>> http://10.238.236.101:8080/source/s?defs=Token&project=2025_RTM> next< >>>>> http://10.238.236.101:8080/source/s?refs=next&project=2025_RTM>() throws >>>>> IOException< >>>>> http://10.238.236.101:8080/source/s?defs=IOException&project=2025_RTM>{ >>>>> if (returnToken< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#returnToken >>>>> ){ >>>>> returnToken< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#returnToken >>>>> = false; return token< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#token >>>>> ; >>>>> } >>>>> else { return null< >>>>> http://10.238.236.101:8080/source/s?defs=null&project=2025_RTM>; } >>>>> >>>>> } >>>>> } >>>>> >>>>> >>>>> We have made code based on 8.11.2 as like below: >>>>> ========================================== >>>>> PayloadTokenStream tokenStream = new PayloadTokenStream(); >>>>> tokenStream.setPayload(currentVal); >>>>> FieldType fieldType = new FieldType(); >>>>> lucField = new Field(config.payloadUid().name, tokenStream, fieldType); >>>>> doc.add(lucField); >>>>> ---- >>>>> public class PayloadTokenStream< >>>>> >>>> http://10.238.236.101:8080/source/s?refs=PayloadTokenStream&project=2025_RTM >>>>> extends TokenStream< >>>>> http://10.238.236.101:8080/source/s?defs=TokenStream&project=2025_RTM>{ >>>>> public static String< >>>>> http://10.238.236.101:8080/source/s?defs=String&project=2025_RTM> >>>>> UID_PAYLOAD_START_VAL< >>>>> >>>> http://10.238.236.101:8080/source/s?refs=UID_PAYLOAD_START_VAL&project=2025_RTM >>>>> = "_UID_"; >>>>> private byte[] buffer< >>>>> http://10.238.236.101:8080/source/s?refs=buffer&project=2025_RTM> = new >>>>> byte[4]; >>>>> private boolean returnToken< >>>>> http://10.238.236.101:8080/source/s?refs=returnToken&project=2025_RTM> = >>>>> false; >>>>> >>>>> public void setPayload< >>>>> http://10.238.236.101:8080/source/s?refs=setPayload&project=2025_RTM >>>>> (int >>>>> uid<http://10.238.236.101:8080/source/s?refs=uid&project=2025_RTM>){ >>>>> buffer< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#buffer >>>>> [0] >>>>> = (byte)uid< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#uid >>>>>> ; >>>>> buffer< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#buffer >>>>> [1] >>>>> = (byte)(uid< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#uid >>>>>>>> 8); >>>>> buffer< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#buffer >>>>> [2] >>>>> = (byte)(uid< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#uid >>>>>>>> 16); >>>>> buffer< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#buffer >>>>> [3] >>>>> = (byte)(uid< >>>>> >>>> http://10.238.236.101:8080/source/xref/2025_RTM/platform.services.search.java/framework/java/sdk/src/com/sap/businessobjects/platform/search/sdk/index/PayloadTokenStream.java#uid >>>>>>>> 24); >>>>> PayloadAttributeImpl attributeImpl = new >>>>> PayloadAttributeImpl(new BytesRef(buffer)); >>>>> addAttributeImpl(attributeImpl); >>>>> returnToken = true; >>>>> } >>>>> public boolean incrementToken() throws IOException { >>>>> if (returnToken){ >>>>> returnToken = false; >>>>> return true; >>>>> } >>>>> else { >>>>> return false; >>>>> } >>>>> } >>>>> } >>>>> >>>>> Regards >>>>> Rajib >>>>> >>>>> >>>> -- >>>> Sincerely yours >>>> Mikhail Khludnev >>>> >> -- >> Uwe Schindler >> Achterdiek 19, D-28357 Bremen >> https://www.thetaphi.de/ >> eMail: u...@thetaphi.de >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de/ > eMail: u...@thetaphi.de > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de/ eMail: u...@thetaphi.de --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org