Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

Mikhail Khludnev Fri, 03 Mar 2023 05:44:26 -0800

Hello, Rajib.
Lucene supports
https://lucene.apache.org/core/8_11_1/core/org/apache/lucene/analysis/tokenattributes/PayloadAttribute.html
You can check
https://lucene.apache.org/core/8_11_1/analyzers-common/org/apache/lucene/analysis/payloads/package-summary.html
for example of payloads injection.
Enjoy.


On Fri, Mar 3, 2023 at 3:48 PM Saha, Rajib <rajib.s...@sap.com.invalid>
wrote:

> Hi Mikhail, Uwe,
>
> We are been able to overcome several hurdles.
> Thanks for your suggestions, which helped us a lot. 😊
>
> We need one more suggestion. Previously, we had used a sample code like
> below:
> =====================================
> byte[] buffer;
> private Token token = new Token(UID_PAYLOAD_START_VAL,0,0);
> //Setting buffer.
> token.setPayload(new Payload(buffer));
> ======================================
>
> Can you please suggest, how we can use latest Lucene for such need, where
> Token and Payload class both are not there now?
>
> Regards
> Rajib
>
> -----Original Message-----
> From: Uwe Schindler <u...@thetaphi.de>
> Sent: 10 February 2023 15:36
> To: java-user@lucene.apache.org
> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2
>
> Hi,
>
> the reason for this is that files in Lucene are always write-once. We
> never ever change a file after it was written and committed in the
> 2-phase-commit. If you write some own index files, e.g. as part of an
> Index Codec you must adhere this rule. See Docvalues or Livedocs
> implementation for an example how "changes" are done in later commits:
> it creates new files with similar name and different suffix having some
> delta-like content.
>
> In general I would really avoid to deal with custom index files. Since
> Lucene 2 there were so many new features so it is never a good idea to
> have your own index file format. Often Docvalues is the solution to all
> your problems you had in early Lucene versions. If you add your own
> stuff not knowing how the transactional model of Lucene work then you
> are possibly causing index corrumption. Index file formats need a
> carefully designed file format with thoughts on transactional safety and
> performance.
>
> If you want to just deal with termporary files, the Directory API allows
> you to maintain temporary files, too.
>
> Uwe
>
> Am 10.02.2023 um 06:49 schrieb Saha, Rajib:
> > Hi Uwe,
> >
> > Thanks for the clarification.
> > We may have to rewrite the whole logic related to it, as seek
> functionality is no more for IndexOutput.
> >
> > BTW, I have one more query related to it.
> > On playing around, I see, directory.createOutput(String name, IOContext
> context) API throwing FileAlreadyExistsException in case the file[say
> output.index] already exists in 8.11.2.
> > Now, I wondering, if my process is closed. And the in new process I want
> to use the same file[output.index] to keep appending to write. How, I can
> achieve it?
> >
> > My Sample code:
> > ========================================
> > Try {
> >       SimpleFSDirectory directory = new SimpleFSDirectory(new
> File("E:\\Lucene-index").toPath());
> >       IndexOutput output = directory.createOutput("output.index",
> IOContext.DEFAULT);
> >       output.writeInt(223344);
> >       output.writeString("Testing Testing");
> >       output.close();
> > } catch(Exception e) {
> >       e.printStackTrace();
> > }
> > ==============================================
> >
> >
> >
> > Regards
> > Rajib
> >
> > -----Original Message-----
> > From: Uwe Schindler <u...@thetaphi.de>
> > Sent: 06 February 2023 16:46
> > To: java-user@lucene.apache.org
> > Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2
> >
> > Hi,
> >
> > Since around Lucene 4 (maybe already in 3) there is no way to write
> > index files in random access anymore. All data must be written in
> > sequence (aka input stream). This is especially important as Lucene
> > index files use checksums since around Lucene 5.
> >
> > Uwe
> >
> > Am 06.02.2023 um 11:57 schrieb Saha, Rajib:
> >> Hi Mikhail,
> >>
> >> Thanks for all you’re your suggestions in one shot.
> >> It helped us a lot.
> >> Thank you very much once again. 😊
> >>
> >> Need one more suggestion for below API.
> >> ==========================
> >> IndexOutput.seek(long pos)
> >> ==========================
> >>
> >> We have used it extensively in around 40-50 places.
> >> Currently, this API is not there.
> >>
> >> Could you please suggest, how we can handle the API in 8.11.2?
> >>
> >> Regards
> >> Rajib
> >>
> >>
> >> -----Original Message-----
> >> From: Mikhail Khludnev <m...@apache.org>
> >> Sent: 01 February 2023 12:22
> >> To: java-user@lucene.apache.org
> >> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2
> >>
> >> Hello, Rajib.
> >>
> >> On Mon, Jan 30, 2023 at 4:07 PM Saha, Rajib <rajib.s...@sap.com.invalid
> >
> >> wrote:
> >>
> >>> Hi Mikhail,
> >>>
> >>> Thanks for your suggestion. It solved lots of cases today in my end. 😊
> >>>
> >>> I need some more suggestions from your end. I am putting together as
> below
> >>> one by one:
> >>> ================================
> >>> In 2.4, we have used couple of cases with APIs:
> >>>
> >>> Field(String name, String value, Field.Store store, Field.Index index)
> >>> Field(String name, String value, Field.Store store, Field.Index index,
> >>> Field.TermVector termVector)
> >>>
> >> Check org.apache.lucene.document.StringField/TextField and its FieldType
> >> constants.
> >>
> >>
> >>> In 8.11, I can see suitable API corresponding to it as :
> >>> Field(String name, Reader reader, IndexableFieldType type)
> >>>
> >>> But, I am not clear, how can I use IndexableFieldType for Field.Store,
> >>> Field.Index, Field.TermVector.
> >>> Can you please suggest here?
> >>>
> >> check usages for org.apache.lucene.document.Field.Store
> >> org.apache.lucene.document.FieldType#setIndexOptions
> >> org.apache.lucene.document.FieldType#setStoreTermVectors
> >>
> >>
> >>
> >>> =================================
> >>>
> >>> In 2.4, there was an API:
> >>> IndexReader.indexExists(File file)
> >>> This checks, if index files exists in the path.
> >>>
> >>> In 8.11, any API, which can do the same job?
> >>>
> >> org.apache.lucene.index.DirectoryReader#indexExists
> >>
> >>
> >>> ==================================
> >>> In 2.4, there was an API:
> >>> IndexReader.isLocked(FSDirectory fsdir)
> >>> IndexReader.unlock(Directory directory)
> >>>
> >>> In 8.11, are IndexReader and IndexWritter synchronized enough
> internally
> >>> for not using the APIs?
> >>>
> >> org.apache.lucene.store.BaseDirectory#obtainLock
> >> Lock.close()
> >>
> >> IndexWriters are mutually exclusive via lock factory.
> >>
> org.apache.lucene.index.DirectoryReader#open(org.apache.lucene.index.IndexWriter)
> >> opens NRT reader i.e. search what not yet committed.
> >>
> >>
> >>> Or is there any other class contain the suitable similar APIs?
> >>>
> >>> ==================================
> >>> If I have to delete a document from Index file with Doc Id, which API
> to
> >>> use?
> >>>
> >>> Previously there was an API
> >>> IndexReader.deleteDocument(docID)
> >>>
> >>
> org.apache.lucene.index.IndexWriter#deleteDocuments(org.apache.lucene.index.Term...)
> >>
> >>
> >>> ==================================
> >>> IndexWritter. addIndexesNoOptimize(FSDirectory[])
> >>> IndexWriter.optimize()
> >>>
> >>> Is there any similar concept in 8.11? If so, can you please help with
> APIs
> >>>
> >>
> org.apache.lucene.index.IndexWriter#addIndexes(org.apache.lucene.store.Directory...)
> >> But it kicks merge underneath. Should be fine.
> >>
> >> ===================================
> >>> Regards
> >>> Rajib
> >>>
> >>>
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Mikhail Khludnev <m...@apache.org>
> >>> Sent: 29 January 2023 18:05
> >>> To: java-user@lucene.apache.org
> >>> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2
> >>>
> >>> Hello,
> >>> You can use
> >>>         // gather list of valid fields from lucene
> >>>         Collection<String> fields = FieldInfos.getIndexedFields(ir);
> >>> to loop field names.
> >>> And then obtain terms per field vis
> >>>
> >>>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F8_11_2%2Fcore%2Forg%2Fapache%2Flucene%2Findex%2FMultiTerms.html%23getTerms-org.apache.lucene.index.IndexReader-java.lang.String-&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QoHZgGc3apPVbalLiuj1AlK7tpSixgwtwQYoQ3NJlUo%3D&reserved=0
> >>>
> >>> On Sun, Jan 29, 2023 at 2:08 PM Saha, Rajib <rajib.s...@sap.com.invalid
> >
> >>> wrote:
> >>>
> >>>> Hi Mikhail,
> >>>>
> >>>> Thanks for the reference link.
> >>>> It really helped me.
> >>>>
> >>>> In One of my requirement, I need to extract, all the Terms in a
> >>>> IndexReader.
> >>>> I was trying the reference code " Fields fields = reader.fields();" in
> >>>> your reference link.
> >>>>
> >>>> But, there is no "reader.fields()" in 8.11.2.
> >>>>
> >>>> Could you please suggest someway to extract all the Terms with an
> >>>> IndexReader or some alternative ways?
> >>>>
> >>>> Regards
> >>>> Rajib
> >>>>
> >>>> -----Original Message-----
> >>>> From: Mikhail Khludnev <m...@apache.org>
> >>>> Sent: 19 January 2023 04:26
> >>>> To: java-user@lucene.apache.org
> >>>> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2
> >>>>
> >>>> [You don't often get email from m...@apache.org. Learn why this is
> >>>> important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>
> >>>> Hello, Rajib.
> >>>> API were evolved since 2.4, but it should be clear
> >>>>
> >>>>
> >>>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F8_11_2%2Fcore%2Forg%2Fapache%2Flucene%2Findex%2Fpackage-summary.html%23fields&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=p3HgAoh8YCwbis2MGNKylpP%2FqZvfZyOEcBanF6kJCFI%3D&reserved=0
> >>>> On Wed, Jan 18, 2023 at 1:11 PM Saha, Rajib
> <rajib.s...@sap.com.invalid>
> >>>> wrote:
> >>>>
> >>>>> Hi All,
> >>>>>
> >>>>> We are in a process for conversion of Lucene from  2.4.0 to 8.11.2
> for
> >>>> our
> >>>>> platform code.
> >>>>> We have used extensively Lucene in our code.
> >>>>>
> >>>>> We have replaced several of our code to Lucene 8.11.2 APIs.
> >>>>>
> >>>>> But, few places, we are stuck of which New Lucene APIs to use, as not
> >>>>> getting any suitable match.
> >>>>>
> >>>>> Can somebody help me, how we can convert below code using Lucene
> 2.4.0
> >>> to
> >>>>> 8.11.2?
> >>>>>
> >>>>>
> >>>>> ProcessDocs(IndexReader reader, Term t) {
> >>>>>
> >>>>>                 final TermDocs termDocs = reader.termDocs();
> >>>>>                 termDocs.seek(t);
> >>>>>                 while (termDocs.next()) {
> >>>>>               //Some internal function to process the doc.
> >>>>>               forEach.process(termDocs.doc());
> >>>>>         }
> >>>>>
> >>>>> }
> >>>>>
> >>>>> Regards
> >>>>> Rajib
> >>>>>
> >>>> --
> >>>> Sincerely yours
> >>>> Mikhail Khludnev
> >>>>
> >>>>
> >>>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=A6dgXg97wrqiuAe1TwNZ1%2Fd8F59RAvQ%2FXzUMIiNbs6U%3D&reserved=0
> >>>> A caveat: Cyrillic!
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>>
> >>>>
> >>> --
> >>> Sincerely yours
> >>> Mikhail Khludnev
> >>>
> >>>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=A6dgXg97wrqiuAe1TwNZ1%2Fd8F59RAvQ%2FXzUMIiNbs6U%3D&reserved=0
> >>> A caveat: Cyrillic!
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>
> >>>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.thetaphi.de%2F&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yswh6GaXvqoV9HQxxOzjUZ7CBPYzMYdLm3j7GsQUDTU%3D&reserved=0
> eMail: u...@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

Reply via email to