I was just checking our repository's status in the Google search index
and noticed that it is indexing bitstreams on both /bitstream and
/rest URLs, ie:

- https://repository.com/bitstream/10568/24495/1/24495.pdf
- https://repository.com/rest/bitstreams/91059/retrieve

This seems like a waste of resources — Googlebot crawls with over
fifty concurrent connections — but also like it would cause a
duplicate content problem as far as search engine optimization where
you are essentially competing with yourself for the top hit for a
certain keyword on the search results.

Has anyone else thought about this? I am thinking of
forbidding/discouraging bots from indexing /rest, either by HTTP 403
or by robots.txt or the "X-Robots-Tag: none" HTTP header.


Alan Orth
"In heaven all the interesting people are missing." ―Friedrich Nietzsche

You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to