Hi Bryan, Search results issues can be difficult to track down without very specific examples or even links to a public website (feel free to use our demo7.dspace.org site to try and reproduce issues). Usually, it's best to look for common patterns in the results you are seeing, as that may be helpful to us in tracking down what those behaviors have in common (e.g. if all the files that do not match searches properly are PDFs, that's a clue. Or, if they all are large files, that'd be a different clue. Or, if you find a specific metadata field isn't searchable, that's yet another clue.)
Since you specified that one difference is in the searching the full text of a document, it's possible that changes/updates to the full text indexing in DSpace 7.3 could be impacting your results. For instance, by default in DSpace 7.3, only the first 100,000 characters of a document are searchable. However, you can change this default in a configuration here: https://github.com/DSpace/DSpace/blob/main/dspace/config/dspace.cfg#L492-L498 (Notice in the comments that you'd have to re-extract text and re-index if you change this setting. Instructions are in those comments) That's a very quick guess though based on the limited info you've been able to provide so far. I'd recommend looking more closely at your results for patterns or common clues...that might be able to help us figure out what the cause may be (and whether it's a bug, or maybe just a configuration that needs to be tweaked). Tim ________________________________ From: [email protected] <[email protected]> on behalf of Snickers <[email protected]> Sent: Wednesday, August 31, 2022 3:59 PM To: DSpace Technical Support <[email protected]> Subject: [dspace-tech] Re: Issue with Dspace 7 search Hi Tim, Thank you for your response. I am sure that there have been many improvements made to Dspace and Solr over the version updates and appreciate the effort of the devs. I looked a bit deeper into the search results from both 5.8 and 7.3. It seems that the search finds the keyword in the thesis text. However, I found an item where the keyword is mentioned once in the text and the search found it. However, I also found a few items where the keyword appeared once or more times in the text that 7.3 did not find but 5.8 Where possibly this can be looked into to resolve the issue? The number of items is similar and the items looked to be migrated successfully. I have successfully run the full reindex commands found in Step 4 of the migration doc: # Reindex all your content in DSpace ./dspace index-discovery -b # (Optionally) also reindex everything into OAI-PMH endpoint ./dspace oai import Please help. Any suggestion would be appreciated. Regards, Bryan On Wednesday, August 31, 2022 at 5:33:24 AM UTC+12 Tim Donohue wrote: Hi Bryan, It's really hard to say what could be going on without you digging more into which items matched in 5.8 which didn't match in 7.3 (or visa versa). It could be that 5.8 was actually returning incomplete results and the results are *more accurate* in 7.3. Or, as you imply, it's also possible the other way around...somehow 7.3 isn't returning as accurate of results as 5.8. But, it is worth pointing out that the Solr search settings under DSpace are enhanced little by little in every release. So, there were many changes/improvements in 6.x and continue to be more in 7.x. We've also upgraded Solr several times in those releases, so it's possible that Solr itself is returning slightly different results based on its new/updated behavior. Overall, until you dig more deeply into those search result differences between 5.x vs 7.x, I wouldn't assume that there's a bug in 7.x. There's also the possibility you are just seeing improvements that resulted in more accurate results. But, that said, if you are able to pinpoint some sort of buggy behavior, then definitely let us know & we'll work to get it assigned and fixed in a future 7.x release. Tim On Monday, August 29, 2022 at 11:32:51 PM UTC-5 [email protected] wrote: Hi, I am migrating Dspace from 5.8 to a new 7.3. I have followed the documentation and completed all tasks - https://wiki.lyrasis.org/display/DSDOC7x/Migrating+DSpace+to+a+new+server and https://wiki.lyrasis.org/display/DSDOC7x/Upgrading+DSpace When I search, for example, I get 26 items from Dspace 7.3 whereas 79 are from Dspace 5.8. When searching for an empty space, I get similar total counts e.g. 5172 and 5192. Did anyone experience this? To be clear, I have run the reindexing commands found in the documentation above and the commands were completed successfully. There was no useful log found since this is technically not an error. Any idea or suggestion would be appreciated. Regards, Bryan -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/2a800873-a3e5-4b5f-ba6c-b04be22139cen%40googlegroups.com<https://groups.google.com/d/msgid/dspace-tech/2a800873-a3e5-4b5f-ba6c-b04be22139cen%40googlegroups.com?utm_medium=email&utm_source=footer>. -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/PH0PR22MB327494F4269666B1FED2A4BEED789%40PH0PR22MB3274.namprd22.prod.outlook.com.
