Hello, Rajib. Lucene supports https://lucene.apache.org/core/8_11_1/core/org/apache/lucene/analysis/tokenattributes/PayloadAttribute.html You can check https://lucene.apache.org/core/8_11_1/analyzers-common/org/apache/lucene/analysis/payloads/package-summary.html for example of payloads injection. Enjoy.
On Fri, Mar 3, 2023 at 3:48โฏPM Saha, Rajib <rajib.s...@sap.com.invalid> wrote: > Hi Mikhail, Uwe, > > We are been able to overcome several hurdles. > Thanks for your suggestions, which helped us a lot. ๐ > > We need one more suggestion. Previously, we had used a sample code like > below: > ===================================== > byte[] buffer; > private Token token = new Token(UID_PAYLOAD_START_VAL,0,0); > //Setting buffer. > token.setPayload(new Payload(buffer)); > ====================================== > > Can you please suggest, how we can use latest Lucene for such need, where > Token and Payload class both are not there now? > > Regards > Rajib > > -----Original Message----- > From: Uwe Schindler <u...@thetaphi.de> > Sent: 10 February 2023 15:36 > To: java-user@lucene.apache.org > Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2 > > Hi, > > the reason for this is that files in Lucene are always write-once. We > never ever change a file after it was written and committed in the > 2-phase-commit. If you write some own index files, e.g. as part of an > Index Codec you must adhere this rule. See Docvalues or Livedocs > implementation for an example how "changes" are done in later commits: > it creates new files with similar name and different suffix having some > delta-like content. > > In general I would really avoid to deal with custom index files. Since > Lucene 2 there were so many new features so it is never a good idea to > have your own index file format. Often Docvalues is the solution to all > your problems you had in early Lucene versions. If you add your own > stuff not knowing how the transactional model of Lucene work then you > are possibly causing index corrumption. Index file formats need a > carefully designed file format with thoughts on transactional safety and > performance. > > If you want to just deal with termporary files, the Directory API allows > you to maintain temporary files, too. > > Uwe > > Am 10.02.2023 um 06:49 schrieb Saha, Rajib: > > Hi Uwe, > > > > Thanks for the clarification. > > We may have to rewrite the whole logic related to it, as seek > functionality is no more for IndexOutput. > > > > BTW, I have one more query related to it. > > On playing around, I see, directory.createOutput(String name, IOContext > context) API throwing FileAlreadyExistsException in case the file[say > output.index] already exists in 8.11.2. > > Now, I wondering, if my process is closed. And the in new process I want > to use the same file[output.index] to keep appending to write. How, I can > achieve it? > > > > My Sample code: > > ======================================== > > Try { > > SimpleFSDirectory directory = new SimpleFSDirectory(new > File("E:\\Lucene-index").toPath()); > > IndexOutput output = directory.createOutput("output.index", > IOContext.DEFAULT); > > output.writeInt(223344); > > output.writeString("Testing Testing"); > > output.close(); > > } catch(Exception e) { > > e.printStackTrace(); > > } > > ============================================== > > > > > > > > Regards > > Rajib > > > > -----Original Message----- > > From: Uwe Schindler <u...@thetaphi.de> > > Sent: 06 February 2023 16:46 > > To: java-user@lucene.apache.org > > Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2 > > > > Hi, > > > > Since around Lucene 4 (maybe already in 3) there is no way to write > > index files in random access anymore. All data must be written in > > sequence (aka input stream). This is especially important as Lucene > > index files use checksums since around Lucene 5. > > > > Uwe > > > > Am 06.02.2023 um 11:57 schrieb Saha, Rajib: > >> Hi Mikhail, > >> > >> Thanks for all youโre your suggestions in one shot. > >> It helped us a lot. > >> Thank you very much once again. ๐ > >> > >> Need one more suggestion for below API. > >> ========================== > >> IndexOutput.seek(long pos) > >> ========================== > >> > >> We have used it extensively in around 40-50 places. > >> Currently, this API is not there. > >> > >> Could you please suggest, how we can handle the API in 8.11.2? > >> > >> Regards > >> Rajib > >> > >> > >> -----Original Message----- > >> From: Mikhail Khludnev <m...@apache.org> > >> Sent: 01 February 2023 12:22 > >> To: java-user@lucene.apache.org > >> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2 > >> > >> Hello, Rajib. > >> > >> On Mon, Jan 30, 2023 at 4:07 PM Saha, Rajib <rajib.s...@sap.com.invalid > > > >> wrote: > >> > >>> Hi Mikhail, > >>> > >>> Thanks for your suggestion. It solved lots of cases today in my end. ๐ > >>> > >>> I need some more suggestions from your end. I am putting together as > below > >>> one by one: > >>> ================================ > >>> In 2.4, we have used couple of cases with APIs: > >>> > >>> Field(String name, String value, Field.Store store, Field.Index index) > >>> Field(String name, String value, Field.Store store, Field.Index index, > >>> Field.TermVector termVector) > >>> > >> Check org.apache.lucene.document.StringField/TextField and its FieldType > >> constants. > >> > >> > >>> In 8.11, I can see suitable API corresponding to it as : > >>> Field(String name, Reader reader, IndexableFieldType type) > >>> > >>> But, I am not clear, how can I use IndexableFieldType for Field.Store, > >>> Field.Index, Field.TermVector. > >>> Can you please suggest here? > >>> > >> check usages for org.apache.lucene.document.Field.Store > >> org.apache.lucene.document.FieldType#setIndexOptions > >> org.apache.lucene.document.FieldType#setStoreTermVectors > >> > >> > >> > >>> ================================= > >>> > >>> In 2.4, there was an API: > >>> IndexReader.indexExists(File file) > >>> This checks, if index files exists in the path. > >>> > >>> In 8.11, any API, which can do the same job? > >>> > >> org.apache.lucene.index.DirectoryReader#indexExists > >> > >> > >>> ================================== > >>> In 2.4, there was an API: > >>> IndexReader.isLocked(FSDirectory fsdir) > >>> IndexReader.unlock(Directory directory) > >>> > >>> In 8.11, are IndexReader and IndexWritter synchronized enough > internally > >>> for not using the APIs? > >>> > >> org.apache.lucene.store.BaseDirectory#obtainLock > >> Lock.close() > >> > >> IndexWriters are mutually exclusive via lock factory. > >> > org.apache.lucene.index.DirectoryReader#open(org.apache.lucene.index.IndexWriter) > >> opens NRT reader i.e. search what not yet committed. > >> > >> > >>> Or is there any other class contain the suitable similar APIs? > >>> > >>> ================================== > >>> If I have to delete a document from Index file with Doc Id, which API > to > >>> use? > >>> > >>> Previously there was an API > >>> IndexReader.deleteDocument(docID) > >>> > >> > org.apache.lucene.index.IndexWriter#deleteDocuments(org.apache.lucene.index.Term...) > >> > >> > >>> ================================== > >>> IndexWritter. addIndexesNoOptimize(FSDirectory[]) > >>> IndexWriter.optimize() > >>> > >>> Is there any similar concept in 8.11? If so, can you please help with > APIs > >>> > >> > org.apache.lucene.index.IndexWriter#addIndexes(org.apache.lucene.store.Directory...) > >> But it kicks merge underneath. Should be fine. > >> > >> =================================== > >>> Regards > >>> Rajib > >>> > >>> > >>> > >>> > >>> -----Original Message----- > >>> From: Mikhail Khludnev <m...@apache.org> > >>> Sent: 29 January 2023 18:05 > >>> To: java-user@lucene.apache.org > >>> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2 > >>> > >>> Hello, > >>> You can use > >>> // gather list of valid fields from lucene > >>> Collection<String> fields = FieldInfos.getIndexedFields(ir); > >>> to loop field names. > >>> And then obtain terms per field vis > >>> > >>> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F8_11_2%2Fcore%2Forg%2Fapache%2Flucene%2Findex%2FMultiTerms.html%23getTerms-org.apache.lucene.index.IndexReader-java.lang.String-&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QoHZgGc3apPVbalLiuj1AlK7tpSixgwtwQYoQ3NJlUo%3D&reserved=0 > >>> > >>> On Sun, Jan 29, 2023 at 2:08 PM Saha, Rajib <rajib.s...@sap.com.invalid > > > >>> wrote: > >>> > >>>> Hi Mikhail, > >>>> > >>>> Thanks for the reference link. > >>>> It really helped me. > >>>> > >>>> In One of my requirement, I need to extract, all the Terms in a > >>>> IndexReader. > >>>> I was trying the reference code " Fields fields = reader.fields();" in > >>>> your reference link. > >>>> > >>>> But, there is no "reader.fields()" in 8.11.2. > >>>> > >>>> Could you please suggest someway to extract all the Terms with an > >>>> IndexReader or some alternative ways? > >>>> > >>>> Regards > >>>> Rajib > >>>> > >>>> -----Original Message----- > >>>> From: Mikhail Khludnev <m...@apache.org> > >>>> Sent: 19 January 2023 04:26 > >>>> To: java-user@lucene.apache.org > >>>> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2 > >>>> > >>>> [You don't often get email from m...@apache.org. Learn why this is > >>>> important at https://aka.ms/LearnAboutSenderIdentification ] > >>>> > >>>> Hello, Rajib. > >>>> API were evolved since 2.4, but it should be clear > >>>> > >>>> > >>> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F8_11_2%2Fcore%2Forg%2Fapache%2Flucene%2Findex%2Fpackage-summary.html%23fields&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=p3HgAoh8YCwbis2MGNKylpP%2FqZvfZyOEcBanF6kJCFI%3D&reserved=0 > >>>> On Wed, Jan 18, 2023 at 1:11 PM Saha, Rajib > <rajib.s...@sap.com.invalid> > >>>> wrote: > >>>> > >>>>> Hi All, > >>>>> > >>>>> We are in a process for conversion of Lucene from 2.4.0 to 8.11.2 > for > >>>> our > >>>>> platform code. > >>>>> We have used extensively Lucene in our code. > >>>>> > >>>>> We have replaced several of our code to Lucene 8.11.2 APIs. > >>>>> > >>>>> But, few places, we are stuck of which New Lucene APIs to use, as not > >>>>> getting any suitable match. > >>>>> > >>>>> Can somebody help me, how we can convert below code using Lucene > 2.4.0 > >>> to > >>>>> 8.11.2? > >>>>> > >>>>> > >>>>> ProcessDocs(IndexReader reader, Term t) { > >>>>> > >>>>> final TermDocs termDocs = reader.termDocs(); > >>>>> termDocs.seek(t); > >>>>> while (termDocs.next()) { > >>>>> //Some internal function to process the doc. > >>>>> forEach.process(termDocs.doc()); > >>>>> } > >>>>> > >>>>> } > >>>>> > >>>>> Regards > >>>>> Rajib > >>>>> > >>>> -- > >>>> Sincerely yours > >>>> Mikhail Khludnev > >>>> > >>>> > >>> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=A6dgXg97wrqiuAe1TwNZ1%2Fd8F59RAvQ%2FXzUMIiNbs6U%3D&reserved=0 > >>>> A caveat: Cyrillic! > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>>> > >>>> > >>> -- > >>> Sincerely yours > >>> Mikhail Khludnev > >>> > >>> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=A6dgXg97wrqiuAe1TwNZ1%2Fd8F59RAvQ%2FXzUMIiNbs6U%3D&reserved=0 > >>> A caveat: Cyrillic! > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> > >>> > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.thetaphi.de%2F&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yswh6GaXvqoV9HQxxOzjUZ7CBPYzMYdLm3j7GsQUDTU%3D&reserved=0 > eMail: u...@thetaphi.de > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev https://t.me/MUST_SEARCH A caveat: Cyrillic!