Hi Bryan,

Search results issues can be difficult to track down without very specific 
examples or even links to a public website (feel free to use our 
demo7.dspace.org site to try and reproduce issues).
Usually, it's best to look for common patterns ​in the results you are seeing, 
as that may be helpful to us in tracking down what those behaviors have in 
common (e.g. if all the files that do not match searches properly are PDFs, 
that's a clue. Or, if they all are large files, that'd be a different clue. Or, 
if you find a specific metadata field isn't searchable, that's yet another 
clue.)

Since you specified that one difference is in the searching the full text​ of a 
document, it's possible that changes/updates to the full text indexing in 
DSpace 7.3 could be impacting your results.

For instance, by default in DSpace 7.3, only the first 100,000 characters of a 
document are searchable. However, you can change this default in a 
configuration here: 
https://github.com/DSpace/DSpace/blob/main/dspace/config/dspace.cfg#L492-L498

(Notice in the comments that you'd have to re-extract text and re-index if you 
change this setting. Instructions are in those comments)

That's a very quick guess though based on the limited info you've been able to 
provide so far.  I'd recommend looking more closely at your results for 
patterns or common clues...that might be able to help us figure out what the 
cause may be (and whether it's a bug, or maybe just a configuration that needs 
to be tweaked).

Tim
________________________________
From: [email protected] <[email protected]> on behalf of 
Snickers <[email protected]>
Sent: Wednesday, August 31, 2022 3:59 PM
To: DSpace Technical Support <[email protected]>
Subject: [dspace-tech] Re: Issue with Dspace 7 search

Hi Tim,

Thank you for your response. I am sure that there have been many improvements 
made to Dspace and Solr over the version updates and appreciate the effort of 
the devs.

I looked a bit deeper into the search results from both 5.8 and 7.3. It seems 
that the search finds the keyword in the thesis text. However, I found an item 
where the keyword is mentioned once in the text and the search found it. 
However, I also found a few items where the keyword appeared once or more times 
in the text that 7.3 did not find but 5.8

Where possibly this can be looked into to resolve the issue? The number of 
items is similar and the items looked to be migrated successfully. I have 
successfully run the full reindex commands found in Step 4 of the migration doc:
# Reindex all your content in DSpace
./dspace index-discovery -b

# (Optionally) also reindex everything into OAI-PMH endpoint
./dspace oai import

Please help. Any suggestion would be appreciated.

Regards,
Bryan

On Wednesday, August 31, 2022 at 5:33:24 AM UTC+12 Tim Donohue wrote:
Hi Bryan,

It's really hard to say what could be going on without you digging more into 
which items matched in 5.8 which didn't match in 7.3 (or visa versa).  It could 
be that 5.8 was actually returning incomplete results and the results are *more 
accurate* in 7.3.  Or, as you imply, it's also possible the other way 
around...somehow 7.3 isn't returning as accurate of results as 5.8.

But, it is worth pointing out that the Solr search settings under DSpace are 
enhanced little by little in every release. So, there were many 
changes/improvements in 6.x and continue to be more in 7.x.  We've also 
upgraded Solr several times in those releases, so it's possible that Solr 
itself is returning slightly different results based on its new/updated 
behavior.

Overall, until you dig more deeply into those search result differences between 
5.x vs 7.x, I wouldn't assume that there's a bug in 7.x. There's also the 
possibility you are just seeing improvements that resulted in more accurate 
results.  But, that said, if you are able to pinpoint some sort of buggy 
behavior, then definitely let us know & we'll work to get it assigned and fixed 
in a future 7.x release.

Tim

On Monday, August 29, 2022 at 11:32:51 PM UTC-5 [email protected] wrote:
Hi,

I am migrating Dspace from 5.8 to a new 7.3. I have followed the documentation 
and completed all tasks - 
https://wiki.lyrasis.org/display/DSDOC7x/Migrating+DSpace+to+a+new+server and 
https://wiki.lyrasis.org/display/DSDOC7x/Upgrading+DSpace

When I search, for example, I get 26 items from Dspace 7.3 whereas 79 are from 
Dspace 5.8. When searching for an empty space, I get similar total counts e.g. 
5172 and 5192.

Did anyone experience this? To be clear, I have run the reindexing commands 
found in the documentation above and the commands were completed successfully.

There was no useful log found since this is technically not an error.

Any idea or suggestion would be appreciated.

Regards,
Bryan


--
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
---
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/2a800873-a3e5-4b5f-ba6c-b04be22139cen%40googlegroups.com<https://groups.google.com/d/msgid/dspace-tech/2a800873-a3e5-4b5f-ba6c-b04be22139cen%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/PH0PR22MB327494F4269666B1FED2A4BEED789%40PH0PR22MB3274.namprd22.prod.outlook.com.

Reply via email to