RE: Need help for conversion code from Lucene 2.4.0 to 8.11.2

Saha, Rajib Fri, 03 Mar 2023 04:48:15 -0800

Hi Mikhail, Uwe,

We are been able to overcome several hurdles.
Thanks for your suggestions, which helped us a lot. 😊


We need one more suggestion. Previously, we had used a sample code like below:
=====================================
byte[] buffer;
private Token token = new Token(UID_PAYLOAD_START_VAL,0,0);
//Setting buffer.
token.setPayload(new Payload(buffer));
======================================

Can you please suggest, how we can use latest Lucene for such need, where Token 
and Payload class both are not there now?

Regards
Rajib

-----Original Message-----
From: Uwe Schindler <[email protected]> 
Sent: 10 February 2023 15:36
To: [email protected]
Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2

Hi,

the reason for this is that files in Lucene are always write-once. We 
never ever change a file after it was written and committed in the 
2-phase-commit. If you write some own index files, e.g. as part of an 
Index Codec you must adhere this rule. See Docvalues or Livedocs 
implementation for an example how "changes" are done in later commits: 
it creates new files with similar name and different suffix having some 
delta-like content.

In general I would really avoid to deal with custom index files. Since 
Lucene 2 there were so many new features so it is never a good idea to 
have your own index file format. Often Docvalues is the solution to all 
your problems you had in early Lucene versions. If you add your own 
stuff not knowing how the transactional model of Lucene work then you 
are possibly causing index corrumption. Index file formats need a 
carefully designed file format with thoughts on transactional safety and 
performance.

If you want to just deal with termporary files, the Directory API allows 
you to maintain temporary files, too.

Uwe

Am 10.02.2023 um 06:49 schrieb Saha, Rajib:
> Hi Uwe,
>
> Thanks for the clarification.
> We may have to rewrite the whole logic related to it, as seek functionality 
> is no more for IndexOutput.
>
> BTW, I have one more query related to it.
> On playing around, I see, directory.createOutput(String name, IOContext 
> context) API throwing FileAlreadyExistsException in case the file[say 
> output.index] already exists in 8.11.2.
> Now, I wondering, if my process is closed. And the in new process I want to 
> use the same file[output.index] to keep appending to write. How, I can 
> achieve it?
>
> My Sample code:
> ========================================
> Try {
>       SimpleFSDirectory directory = new SimpleFSDirectory(new 
> File("E:\\Lucene-index").toPath());
>       IndexOutput output = directory.createOutput("output.index", 
> IOContext.DEFAULT);
>       output.writeInt(223344);
>       output.writeString("Testing Testing");
>       output.close();
> } catch(Exception e) {
>       e.printStackTrace();
> }
> ==============================================
>
>
>
> Regards
> Rajib
>
> -----Original Message-----
> From: Uwe Schindler <[email protected]>
> Sent: 06 February 2023 16:46
> To: [email protected]
> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2
>
> Hi,
>
> Since around Lucene 4 (maybe already in 3) there is no way to write
> index files in random access anymore. All data must be written in
> sequence (aka input stream). This is especially important as Lucene
> index files use checksums since around Lucene 5.
>
> Uwe
>
> Am 06.02.2023 um 11:57 schrieb Saha, Rajib:
>> Hi Mikhail,
>>
>> Thanks for all you’re your suggestions in one shot.
>> It helped us a lot.
>> Thank you very much once again. 😊
>>
>> Need one more suggestion for below API.
>> ==========================
>> IndexOutput.seek(long pos)
>> ==========================
>>
>> We have used it extensively in around 40-50 places.
>> Currently, this API is not there.
>>
>> Could you please suggest, how we can handle the API in 8.11.2?
>>
>> Regards
>> Rajib
>>
>>
>> -----Original Message-----
>> From: Mikhail Khludnev <[email protected]>
>> Sent: 01 February 2023 12:22
>> To: [email protected]
>> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2
>>
>> Hello, Rajib.
>>
>> On Mon, Jan 30, 2023 at 4:07 PM Saha, Rajib <[email protected]>
>> wrote:
>>
>>> Hi Mikhail,
>>>
>>> Thanks for your suggestion. It solved lots of cases today in my end. 😊
>>>
>>> I need some more suggestions from your end. I am putting together as below
>>> one by one:
>>> ================================
>>> In 2.4, we have used couple of cases with APIs:
>>>
>>> Field(String name, String value, Field.Store store, Field.Index index)
>>> Field(String name, String value, Field.Store store, Field.Index index,
>>> Field.TermVector termVector)
>>>
>> Check org.apache.lucene.document.StringField/TextField and its FieldType
>> constants.
>>
>>
>>> In 8.11, I can see suitable API corresponding to it as :
>>> Field(String name, Reader reader, IndexableFieldType type)
>>>
>>> But, I am not clear, how can I use IndexableFieldType for Field.Store,
>>> Field.Index, Field.TermVector.
>>> Can you please suggest here?
>>>
>> check usages for org.apache.lucene.document.Field.Store
>> org.apache.lucene.document.FieldType#setIndexOptions
>> org.apache.lucene.document.FieldType#setStoreTermVectors
>>
>>
>>
>>> =================================
>>>
>>> In 2.4, there was an API:
>>> IndexReader.indexExists(File file)
>>> This checks, if index files exists in the path.
>>>
>>> In 8.11, any API, which can do the same job?
>>>
>> org.apache.lucene.index.DirectoryReader#indexExists
>>
>>
>>> ==================================
>>> In 2.4, there was an API:
>>> IndexReader.isLocked(FSDirectory fsdir)
>>> IndexReader.unlock(Directory directory)
>>>
>>> In 8.11, are IndexReader and IndexWritter synchronized enough internally
>>> for not using the APIs?
>>>
>> org.apache.lucene.store.BaseDirectory#obtainLock
>> Lock.close()
>>
>> IndexWriters are mutually exclusive via lock factory.
>> org.apache.lucene.index.DirectoryReader#open(org.apache.lucene.index.IndexWriter)
>> opens NRT reader i.e. search what not yet committed.
>>
>>
>>> Or is there any other class contain the suitable similar APIs?
>>>
>>> ==================================
>>> If I have to delete a document from Index file with Doc Id, which API to
>>> use?
>>>
>>> Previously there was an API
>>> IndexReader.deleteDocument(docID)
>>>
>> org.apache.lucene.index.IndexWriter#deleteDocuments(org.apache.lucene.index.Term...)
>>
>>
>>> ==================================
>>> IndexWritter. addIndexesNoOptimize(FSDirectory[])
>>> IndexWriter.optimize()
>>>
>>> Is there any similar concept in 8.11? If so, can you please help with APIs
>>>
>> org.apache.lucene.index.IndexWriter#addIndexes(org.apache.lucene.store.Directory...)
>> But it kicks merge underneath. Should be fine.
>>
>> ===================================
>>> Regards
>>> Rajib
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Mikhail Khludnev <[email protected]>
>>> Sent: 29 January 2023 18:05
>>> To: [email protected]
>>> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2
>>>
>>> Hello,
>>> You can use
>>>         // gather list of valid fields from lucene
>>>         Collection<String> fields = FieldInfos.getIndexedFields(ir);
>>> to loop field names.
>>> And then obtain terms per field vis
>>>
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F8_11_2%2Fcore%2Forg%2Fapache%2Flucene%2Findex%2FMultiTerms.html%23getTerms-org.apache.lucene.index.IndexReader-java.lang.String-&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=QoHZgGc3apPVbalLiuj1AlK7tpSixgwtwQYoQ3NJlUo%3D&reserved=0
>>>
>>> On Sun, Jan 29, 2023 at 2:08 PM Saha, Rajib <[email protected]>
>>> wrote:
>>>
>>>> Hi Mikhail,
>>>>
>>>> Thanks for the reference link.
>>>> It really helped me.
>>>>
>>>> In One of my requirement, I need to extract, all the Terms in a
>>>> IndexReader.
>>>> I was trying the reference code " Fields fields = reader.fields();" in
>>>> your reference link.
>>>>
>>>> But, there is no "reader.fields()" in 8.11.2.
>>>>
>>>> Could you please suggest someway to extract all the Terms with an
>>>> IndexReader or some alternative ways?
>>>>
>>>> Regards
>>>> Rajib
>>>>
>>>> -----Original Message-----
>>>> From: Mikhail Khludnev <[email protected]>
>>>> Sent: 19 January 2023 04:26
>>>> To: [email protected]
>>>> Subject: Re: Need help for conversion code from Lucene 2.4.0 to 8.11.2
>>>>
>>>> [You don't often get email from [email protected]. Learn why this is
>>>> important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>
>>>> Hello, Rajib.
>>>> API were evolved since 2.4, but it should be clear
>>>>
>>>>
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F8_11_2%2Fcore%2Forg%2Fapache%2Flucene%2Findex%2Fpackage-summary.html%23fields&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=p3HgAoh8YCwbis2MGNKylpP%2FqZvfZyOEcBanF6kJCFI%3D&reserved=0
>>>> On Wed, Jan 18, 2023 at 1:11 PM Saha, Rajib <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> We are in a process for conversion of Lucene from  2.4.0 to 8.11.2 for
>>>> our
>>>>> platform code.
>>>>> We have used extensively Lucene in our code.
>>>>>
>>>>> We have replaced several of our code to Lucene 8.11.2 APIs.
>>>>>
>>>>> But, few places, we are stuck of which New Lucene APIs to use, as not
>>>>> getting any suitable match.
>>>>>
>>>>> Can somebody help me, how we can convert below code using Lucene 2.4.0
>>> to
>>>>> 8.11.2?
>>>>>
>>>>>
>>>>> ProcessDocs(IndexReader reader, Term t) {
>>>>>
>>>>>                 final TermDocs termDocs = reader.termDocs();
>>>>>                 termDocs.seek(t);
>>>>>                 while (termDocs.next()) {
>>>>>               //Some internal function to process the doc.
>>>>>               forEach.process(termDocs.doc());
>>>>>         }
>>>>>
>>>>> }
>>>>>
>>>>> Regards
>>>>> Rajib
>>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>>
>>>>
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=A6dgXg97wrqiuAe1TwNZ1%2Fd8F59RAvQ%2FXzUMIiNbs6U%3D&reserved=0
>>>> A caveat: Cyrillic!
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>>
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ft.me%2FMUST_SEARCH&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=A6dgXg97wrqiuAe1TwNZ1%2Fd8F59RAvQ%2FXzUMIiNbs6U%3D&reserved=0
>>> A caveat: Cyrillic!
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
-- 
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.thetaphi.de%2F&data=05%7C01%7Crajib.saha%40sap.com%7C100c7ff24bc1433a239a08db0b4e873f%7C42f7676cf455423c82f6dc2d99791af7%7C0%7C0%7C638116204144897215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=yswh6GaXvqoV9HQxxOzjUZ7CBPYzMYdLm3j7GsQUDTU%3D&reserved=0
eMail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Need help for conversion code from Lucene 2.4.0 to 8.11.2

Reply via email to