Nothing changed between two index generations except the data changed a bit as i described.

When Lucene is done generating index, that is what i am reporting as the size of the directory where all index files are stored.

I dont know about deleted docs? How do you trace that? yes the queries run exactly the same way (same number of results) most of the time the order is just changed which is fine; or some few different entries show up and i dont know why since lowecase filter should normalize even if original data casing changes.

Yes absolutely sure nothing else changed. i kept all those things the same across two runs.

actually does lucene repository have these kinda experiments accross versions (major or minor versions)?

if i were lucene i would do these experiments to see the impact on index end results. this will help find out some potential un-indentified bugs.

Methodology:

have a large dataset like 15 million docs

run index at each time a new version comes out with very common settings.


i am not using solr, pure lucene 7.7.2. these info were in the other email here. let me copy paste here:



===== previous email ====

On a related issue:

i experience that with Version 7.7.2 i experienced this:

data is all lower case (same amount of docs as next case though)

vs

data is camel case except last word always in capital letters


but i used in indexer the lowercase filter in both cases so indexing is done with all lower cases and i saw the first case's index size for case is like 9.5GB

but same data size for second case was 11GB.


what causes such difference and increase in index size? amount of docs are the same in both cases.


Best regards



On 11/13/20 7:39 AM, Erick Erickson wrote:
What does “final finished sizes” mean? After optimize of just after finishing 
all indexing?
The former is what counts here.

And you provided no information on the number of deleted docs in the two cases. 
Is
the number of deletedDocs the same (or close)? And does the q=*:* query
return the same numFound?

Finally, are you absolutely and totally sure that no other options changed. For 
instance,
you specified docValues=true for some field in one but not the other. Or 
stored=true
etc. If you’re using the same schema.

And you also haven’t provided information on what versions of Solr you’re 
talking about.
You mention 7.7.2, but not the _other_ version of solr. If you’re going from 
one major
version to another, sometimes defaults change for docValues on primitive fields
especially. I’d consider firing up Luke and examining the field definitions in
detail.

Best,
Erick

On Nov 13, 2020, at 12:16 AM, baris.ka...@oracle.com wrote:

Hi,-
Thanks.
These are final finished sizes in both cases.
Best regards


On Nov 12, 2020, at 11:12 PM, Erick Erickson <erickerick...@gmail.com> wrote:

Yes, that issue is fixed. The “Resolution” tag is the key, it’s marked “fixed” 
and the version is 8.0

As for your other question, index size is a very imprecise number. How many 
deleted documents are there
in each case? Deleted documents take up disk space until the segments 
containing them are merged away.

Best,
Erick

On Nov 12, 2020, at 5:35 PM, baris.ka...@oracle.com wrote:

https://urldefense.com/v3/__https://issues.apache.org/jira/browse/LUCENE-8448__;!!GqivPVa7Brio!I3RsAXIoDcPmpP_sc8C29vn8DcAXSvIgH7pvcxyDaBnfhdJAk24zPpQhqP035V1IJA$


Hi,-

is this issue fixed please? Could You please help me figure it out?

Best regards



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to