Hi... We (at Cornell) have discovered a weird problem with Google harvesting and wonder if anyone else has seen this. One of our collections is called Cornell Alumni News and it contains PDF versions of archived editions of our Alumni News Magazine (dating back to the 1800's). Each of the PDFs contain scanned images with underlining OCR. Each item is one volume with 12-18 bitstreams (PDFs) which represent an issue.
We have found that Google has harvested these issues inconsistently. For instance, for one volume we will find that 5 of the bitstreams appear in Google but 7 do not. Has anyone else seen anything like this? We are wondering if it may be a product of the size of the PDFs. The ones which weren't harvested seem consistently large (around 30MB). P.S. I did go through the Google WebMaster Tools, but I couldn't find anything that indicated a problem on their end. -- *************************** George Kozak Digital Library Specialist Division of Library Information Technologies (DLIT), Digital Media Group 501 Olin Library Cornell University 607-255-8924 *************************** g...@cornell.edu ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech