Hi Bill! I was wrong, there is the companion setting in discovery.cfg (discovery.solr.fulltext.charLimit). I change it to -1 and now DSpace is finding all occurrences in the search.
But now I find other problem. I sent other PDF files, and some of them, when I execute the command ./dspace index-discovery, some files are erased. Thank you. Erivelto Em qui., 2 de mar. de 2023 às 22:43, Erivelto Alves <[email protected]> escreveu: > Hi Bill & Tim! > > This is the server spec: > > Server DSpace App > Ubuntu Server 22.04.2 LTS > openjdk version "11.0.18" 2023-01-17 > OpenJDK Runtime Environment (build 11.0.18+10-post-Ubuntu-0ubuntu122.04) > OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Ubuntu-0ubuntu122.04, > mixed mode, sharing) > Tomcat 9.0.72 > Solr 8.11.2 > Apache Maven 3.6.3 > Apache Ant 1.10.12 > NodeJS v16.19.1 > npm 8.19.3 > ----------------------------- > Server Database > Ubuntu Server 22.04.2 LTS > PostgreSQL 14.6 (Ubuntu 14.6-0ubuntu0.22.04.1) > > No error in the installation of DSpace. > The PDF files that were compared in DS6.5 and DS7.5 are the same and were > batch imported into a SAF file. No import error. > I modified the configurations for extractor text in the DSPACE.cfg file > for textextractor.max -chars = -1. All files were 100% converted to TXT. > > Bill, there is no companion setting in discovery.cfg > (discovery.solr.fulltext.charLimit). > > The research failure remains. > > Thanks. > > Erivelto > > > Em qui., 2 de mar. de 2023 às 14:27, Bill Tantzen <[email protected]> > escreveu: > >> There is a companion setting in discovery.cfg >> (discovery.solr.fulltext.charLimit) which limits the number of characters >> that are actually stored in the solr index in the fulltext field; >> initially, that is also set to 100000 characters. Simply set this to a >> higher count, or -1 for unlimited. >> >> Hope that helps! >> ~~Bill >> >> On Thu, Mar 2, 2023 at 10:50 AM 'Tim Donohue' via DSpace Community < >> [email protected]> wrote: >> >>> Hi Erivelto, >>> >>> I'd recommend looking more closely at the 5 items which were matched in >>> DSpace 6.3 but not in 7.5. Is there something in common among those 5 >>> items? Is the search results match occurring in the metadata of those >>> items or in the full text? >>> >>> If you can narrow things down, it'd be much easier to provide >>> support/ideas. There have been a lot of changes in the search engine of >>> DSpace 7.5... including a move to a later version of Solr. It's possible >>> you've found a bug, or it could be a misconfiguration, or simply a change >>> in the behavior of Solr. It's difficult to narrow down without more >>> information about the differences in the results that you are seeing. >>> >>> If you can send more information to this list or your email to >>> dspace-tech (as I see you sent the same email to both lists), that might >>> provide others with more clues as to what might be going on. >>> >>> Tim >>> ------------------------------ >>> *From:* [email protected] < >>> [email protected]> on behalf of Erivelto Henrique < >>> [email protected]> >>> *Sent:* Thursday, March 2, 2023 7:41 AM >>> *To:* DSpace Community <[email protected]> >>> *Subject:* [dspace-community] Differences in search result on items >>> between DSpace 6.3 / DSpace 7.5 >>> >>> Hi everyone; >>> >>> I have a DSpace 6.3 installation and am deploying a new document >>> repository with version 7.5 >>> We have already installed the new version 7.5 on a new server, and we >>> have imported some documents for this new installation. >>> We did some search tests and noticed a very big difference in search >>> results between the two versions. >>> When I search for a term in version 6.3, I get 14 results found for the >>> search, and when I search in version 7.5, I only get 9 returns. >>> Version 6.3 search result >>> [image: Screenshot_8.png] >>> Search result in version 7.5 >>> [image: Screenshot_10.png] >>> >>> With some PDF documents that are very large and the Text Extractor >>> settings were set to 100k characters, the file was not converting 100% to >>> TXT. I changed it to textextractor.max-chars = -1 but still the search >>> result remains the same. >>> >>> Anyone can help with this? >>> >>> Thanks >>> >>> Erivelto >>> >>> -- >>> All messages to this mailing list should adhere to the Code of Conduct: >>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "DSpace Community" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/dspace-community/9ba0b682-523b-4709-a6cb-de2284f1f90bn%40googlegroups.com >>> <https://groups.google.com/d/msgid/dspace-community/9ba0b682-523b-4709-a6cb-de2284f1f90bn%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> -- >>> All messages to this mailing list should adhere to the Code of Conduct: >>> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "DSpace Community" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/dspace-community/PH0PR22MB327422A68D1E78434679A776EDB29%40PH0PR22MB3274.namprd22.prod.outlook.com >>> <https://groups.google.com/d/msgid/dspace-community/PH0PR22MB327422A68D1E78434679A776EDB29%40PH0PR22MB3274.namprd22.prod.outlook.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> Human wheels spin round and round >> While the clock keeps the pace... -- John Mellencamp >> ________________________________________________________________ >> Bill Tantzen University of Minnesota Libraries >> 612-626-9949 (U of M) 612-325-1777 (cell) >> > -- All messages to this mailing list should adhere to the Code of Conduct: https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx --- You received this message because you are subscribed to the Google Groups "DSpace Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/CAFBNU5f3ua-Qd82QeduS-hZ9DEFoWpiBuDTEkYvb1%3Dy33kzQGg%40mail.gmail.com.
