I was just checking our repository's status in the Google search index
and noticed that it is indexing bitstreams on both /bitstream and
/rest URLs, ie:
This seems like a waste of resources — Googlebot crawls with over
fifty concurrent connections — but also like it would cause a
duplicate content problem as far as search engine optimization where
you are essentially competing with yourself for the top hit for a
certain keyword on the search results.
Has anyone else thought about this? I am thinking of
forbidding/discouraging bots from indexing /rest, either by HTTP 403
or by robots.txt or the "X-Robots-Tag: none" HTTP header.
"In heaven all the interesting people are missing." ―Friedrich Nietzsche
You received this message because you are subscribed to the Google Groups
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email
To post to this group, send email to firstname.lastname@example.org.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.