Hi Bill,

Where are you seeing these "14 instances of each TEXT bitstream" in the 
source of the HTML?  I'm not seeing that behavior on the demo site...or 
maybe I'm misunderstanding how to find it (or overlooking it)? 

Can you see those same references by viewing the source of an Item on the 
demo site?  For 
example: https://demo.dspace.org/items/bb3eb3d2-9796-4a6b-b08e-af914e2438a9

It is very possible that Google's crawlers are finding it in the HTML 
source if it's there.  Keep in mind though that (at least as far as I'm 
aware) Google Scholar's bots are only accessing the SSR (server side 
rendered) version of the page, i.e. the page you'd see if you turn off 
Javascript in your browser.

Tim

On Tuesday, February 3, 2026 at 4:55:43 PM UTC-6 [email protected] wrote:

> Same here. Nothing in the UI, but when I view the source, I see 14 
> instances of each TEXT bitstream there.  Perhaps google has learned to 
> parse them from there?  I cannot find a trace of them anywhere else.
> ~~Bill
>
> On Tue, Feb 3, 2026 at 4:44 PM DSpace Technical Support <
> [email protected]> wrote:
>
>> Hi Bill,
>>
>> I have to admit, I find this confusing too.  I'm also not aware of 
>> anywhere in the UI where we provide a *publicly available* link to files in 
>> the TEXT bundle.  If there is such a way that we are "exposing" the TEXT 
>> bundle to crawlers, then it's accidental.  Files in that TEXT bundle are 
>> not meant for public downloads.
>>
>> Are you able to get any clues from which Google Scholar regarding which 
>> Items are linking to TEXT bundles?  Are they all newer content, or older 
>> content?  If older content, it's always possible this was a bug in an older 
>> version of DSpace.  If newer content, that implies maybe we're missing a 
>> place these are exposed in recent DSpace versions...that'd imply though 
>> that they'd be in the HTML *somewhere*, likely either on the Item page or 
>> the "Full" Item page.  (I'm not seeing them on either of those pages on our 
>> demo site though, e.g. 
>> https://demo.dspace.org/items/bb3eb3d2-9796-4a6b-b08e-af914e2438a9 or 
>> https://demo.dspace.org/items/bb3eb3d2-9796-4a6b-b08e-af914e2438a9/full 
>> ).  Either that, or Google Scholar's bot is finding links to them elsewhere 
>> on the web (which would be odd).
>>
>> Overall, I think this might require digging for more clues...or (as 
>> you've already done) seeing if others have seen this behavior as well.  
>> Either one might help us narrow things down.
>>
>> Tim
>>
>>
>>
>> On Tuesday, February 3, 2026 at 11:04:49 AM UTC-6 [email protected] wrote:
>>
>>> We are discovering extracted text, from the TEXT bundle indexed in 
>>> Google Scholar.  I'm not sure how this is happening.  bitstreams in the 
>>> TEXT bundle are referenced numerous times in the <script> element of the 
>>> source code, but not in the UI so far as I can tell.
>>>
>>> Is there a way to prevent these bitstreams from being indexed?
>>>
>>> Thanks for any tips!
>>> ~~Bill
>>>
>>> -- 
>>> ______________________________________
>>> Bill Tantzen    University of Minnesota Libraries
>>> 612-626-9949 <(612)%20626-9949> (U of M)  612-325-1777 
>>> <(612)%20325-1777> (mobile)
>>>
>> -- 
>> All messages to this mailing list should adhere to the Code of Conduct: 
>> https://lyrasis.org/code-of-conduct/
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "DSpace Technical Support" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion visit 
>> https://groups.google.com/d/msgid/dspace-tech/e642e6b8-7e83-4460-bb71-0879627bf17dn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/dspace-tech/e642e6b8-7e83-4460-bb71-0879627bf17dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>
> -- 
> ______________________________________
> Bill Tantzen    University of Minnesota Libraries
> 612-626-9949 <(612)%20626-9949> (U of M)  612-325-1777 <(612)%20325-1777> 
> (mobile)
>

-- 
All messages to this mailing list should adhere to the Code of Conduct: 
https://lyrasis.org/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/dspace-tech/131571b5-c253-41a9-989a-eb06541eb4d2n%40googlegroups.com.

Reply via email to