Nothing changed between two index generations except the data changed a
bit as i described.
When Lucene is done generating index, that is what i am reporting as the
size of the directory where all index files are stored.
I dont know about deleted docs? How do you trace that? yes the queries
run exactly the same way (same number of results) most of the time the
order is just changed which is fine; or some few different entries show
up and i dont know why since lowecase filter should normalize even if
original data casing changes.
Yes absolutely sure nothing else changed. i kept all those things the
same across two runs.
actually does lucene repository have these kinda experiments accross
versions (major or minor versions)?
if i were lucene i would do these experiments to see the impact on index
end results. this will help find out some potential un-indentified bugs.
Methodology:
have a large dataset like 15 million docs
run index at each time a new version comes out with very common settings.
i am not using solr, pure lucene 7.7.2. these info were in the other
email here. let me copy paste here:
===== previous email ====
On a related issue:
i experience that with Version 7.7.2 i experienced this:
data is all lower case (same amount of docs as next case though)
vs
data is camel case except last word always in capital letters
but i used in indexer the lowercase filter in both cases so indexing is
done with all lower cases and i saw the first case's index size for case
is like 9.5GB
but same data size for second case was 11GB.
what causes such difference and increase in index size? amount of docs
are the same in both cases.
Best regards
On 11/13/20 7:39 AM, Erick Erickson wrote:
What does “final finished sizes” mean? After optimize of just after finishing
all indexing?
The former is what counts here.
And you provided no information on the number of deleted docs in the two cases.
Is
the number of deletedDocs the same (or close)? And does the q=*:* query
return the same numFound?
Finally, are you absolutely and totally sure that no other options changed. For
instance,
you specified docValues=true for some field in one but not the other. Or
stored=true
etc. If you’re using the same schema.
And you also haven’t provided information on what versions of Solr you’re
talking about.
You mention 7.7.2, but not the _other_ version of solr. If you’re going from
one major
version to another, sometimes defaults change for docValues on primitive fields
especially. I’d consider firing up Luke and examining the field definitions in
detail.
Best,
Erick
On Nov 13, 2020, at 12:16 AM, baris.ka...@oracle.com wrote:
Hi,-
Thanks.
These are final finished sizes in both cases.
Best regards
On Nov 12, 2020, at 11:12 PM, Erick Erickson <erickerick...@gmail.com> wrote:
Yes, that issue is fixed. The “Resolution” tag is the key, it’s marked “fixed”
and the version is 8.0
As for your other question, index size is a very imprecise number. How many
deleted documents are there
in each case? Deleted documents take up disk space until the segments
containing them are merged away.
Best,
Erick
On Nov 12, 2020, at 5:35 PM, baris.ka...@oracle.com wrote:
https://urldefense.com/v3/__https://issues.apache.org/jira/browse/LUCENE-8448__;!!GqivPVa7Brio!I3RsAXIoDcPmpP_sc8C29vn8DcAXSvIgH7pvcxyDaBnfhdJAk24zPpQhqP035V1IJA$
Hi,-
is this issue fixed please? Could You please help me figure it out?
Best regards
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org