There is a companion setting in discovery.cfg
(discovery.solr.fulltext.charLimit) which limits the number of characters
that are actually stored in the solr index in the fulltext field;
initially, that is also set to 100000 characters.  Simply set this to a
higher count, or -1 for unlimited.

Hope that helps!
~~Bill

On Thu, Mar 2, 2023 at 10:50 AM 'Tim Donohue' via DSpace Community <
[email protected]> wrote:

> Hi Erivelto,
>
> I'd recommend looking more closely at the 5 items which were matched in
> DSpace 6.3 but not in 7.5.  Is there something in common among those 5
> items?  Is the search results match occurring in the metadata of those
> items or in the full text?
>
> If you can narrow things down, it'd be much easier to provide
> support/ideas.  There have been a lot of changes in the search engine of
> DSpace 7.5... including a move to a later version of Solr.  It's possible
> you've found a bug, or it could be a misconfiguration, or simply a change
> in the behavior of Solr.  It's difficult to narrow down without more
> information about the differences in the results that you are seeing.
>
> If you can send more information to this list or your email to dspace-tech
> (as I see you sent the same email to both lists), that might provide others
> with more clues as to what might be going on.
>
> Tim
> ------------------------------
> *From:* [email protected] <
> [email protected]> on behalf of Erivelto Henrique <
> [email protected]>
> *Sent:* Thursday, March 2, 2023 7:41 AM
> *To:* DSpace Community <[email protected]>
> *Subject:* [dspace-community] Differences in search result on items
> between DSpace 6.3 / DSpace 7.5
>
> Hi everyone;
>
> I have a DSpace 6.3 installation and am deploying a new document
> repository with version 7.5
> We have already installed the new version 7.5 on a new server, and we have
> imported some documents for this new installation.
> We did some search tests and noticed a very big difference in search
> results between the two versions.
> When I search for a term in version 6.3, I get 14 results found for the
> search, and when I search in version 7.5, I only get 9 returns.
> Version 6.3 search result
> [image: Screenshot_8.png]
> Search result in version 7.5
> [image: Screenshot_10.png]
>
> With some PDF documents that are very large and the Text Extractor
> settings were set to 100k characters, the file was not converting 100% to
> TXT. I changed it to textextractor.max-chars = -1 but still the search
> result remains the same.
>
> Anyone can help with this?
>
> Thanks
>
> Erivelto
>
> --
> All messages to this mailing list should adhere to the Code of Conduct:
> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dspace-community/9ba0b682-523b-4709-a6cb-de2284f1f90bn%40googlegroups.com
> <https://groups.google.com/d/msgid/dspace-community/9ba0b682-523b-4709-a6cb-de2284f1f90bn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> --
> All messages to this mailing list should adhere to the Code of Conduct:
> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
> ---
> You received this message because you are subscribed to the Google Groups
> "DSpace Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dspace-community/PH0PR22MB327422A68D1E78434679A776EDB29%40PH0PR22MB3274.namprd22.prod.outlook.com
> <https://groups.google.com/d/msgid/dspace-community/PH0PR22MB327422A68D1E78434679A776EDB29%40PH0PR22MB3274.namprd22.prod.outlook.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Human wheels spin round and round
While the clock keeps the pace... -- John Mellencamp
________________________________________________________________
Bill Tantzen    University of Minnesota Libraries
612-626-9949 (U of M)    612-325-1777 (cell)

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-community/CADgrb7EdJyMkL7%2BN0vkQ-eMECT_6P0pgT_bgtKtgZ_XaJLc10g%40mail.gmail.com.

Reply via email to