All, We discussed this topic in yesterday's Developers Meeting <https://wiki.lyrasis.org/display/DSPACE/2026-02-05+DSpace+Developers+Meeting>, and realized that the data within the <script> tag is the *cache* (application state) being passed from the server-side to the client-side. This is how Angular applications pass this cached data.
However, it *is possible* to turn this cache off within DSpace by setting this in your config.*.yml: ssr: transferState: false This setting will turn off the passing of REST queries in that cache, so that the <script> tag will no longer send all the REST queries previously run. This *might* fix the Google Scholar issues (assuming that Google Scholar is finding these bitstreams via that <script> tag), but it does have a side effect. Obviously, if you turn off this cache, then you may see a small number of *duplicate* REST API queries when your site switches from SSR to CSR on the first page visit. Essentially, during the SSR, the server-side code is making calls to the REST API to build the HTML. But, after switching to CSR, some of those same REST API calls may be made by the page in the users browser. However, I'm told by others who use this "transferState: false" setting in production that the performance impacts are minimal simply because once you switch to CSR you'll stay in CSR. So, these potentially duplicate queries may only occur on the first page you visit in the site. A new ticket has been created to help describe this problem in more detail: https://github.com/DSpace/DSpace/issues/11871 (And I've added these same notes to a comment on that ticket.) While disabling this cache is a potential "quick fix", the more correct solution is likely to access restrict these bundles/bitstreams altogether (as they are only needed within DSpace). So the other ticket https://github.com/DSpace/DSpace/issues/11681 is still a more valid solution for the long term. I'm hoping to find a volunteer to start investigating default access restrictions on these bundles. If you have other questions, feel free to ask them here or in the tickets. Tim On Wednesday, February 4, 2026 at 2:58:35 PM UTC-6 DSpace Technical Support wrote: > Hi All, > > Thanks for the additional details everyone. This sounds like it's > occurring in several institutions, which definitely implies this is a more > widespread issue in Google Scholar's indexing of DSpace sites. > > Regarding the TEXT bitstream URL appearing in the <script> tag: I'm seeing > what you mean, Bill. Now that I look closely, I'm seeing it also on our > demo site. There does seem to be some extraneous JSON in that <script> tag > that looks like cached responses from the REST backend.... I'm not exactly > sure where that's coming from, and it *does* seem to sometimes include the > URL of the TEXT bundle file. > > My guess would be that *might be* where Google Scholar is finding the > link, but I cannot say with any certainty. They obviously don't share all > the information on how they index sites. But, I do know that Google > Scholar uses the SSR (server side rendered) HTML page. Their bots don't use > OAI or anything else like that. > > I'll bring this up in tomorrow's DSpace Developers Meeting to see if > anyone has brainstorms on a possible fix. It sounds like either we need to > find what is adding that extraneous JSON (and it could be something in > Angular), or we may need to re-prioritize a fix for the TEXT bundle > permissions discussion (that Sascha noted) that was logged in > https://github.com/DSpace/DSpace/issues/11681. > > Tim > > On Wednesday, February 4, 2026 at 10:09:34 AM UTC-6 Andrew K wrote: > >> Hello, >> >> It looks like the extracted text is intended for internal search, right? >> Then it should never be exposed. >> >> WBR, >> Andrew >> >> середа, 4 лютого 2026 р. о 10:02:32 UTC+2 Sascha Szott пише: >> >> Hello everyone, >> >> just a small note regarding the discussion: we already talked about the >> topic of bitstreams in the TEXT bundle in a developer meeting last year. >> >> This resulted in the GitHub ticket >> >> https://github.com/DSpace/DSpace/issues/11681. >> >> Presumably, we can restrict access to the bitstreams in the TEXT bundle. >> Ideally, the URLs should not appear in the SSR output at all. >> >> Best >> Sascha >> >> -- All messages to this mailing list should adhere to the Code of Conduct: https://lyrasis.org/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/dspace-tech/da273ca8-8e88-4669-9132-6fc490bc33d5n%40googlegroups.com.
