hi George as far as i know, Google does enforce limits on file size, but what it is may vary over time ...
from the perspective of electronic theses, a while back NDLTD was advised by Google that all theses should be single-PDF documents, as that fit in with their model - their digital object unit is apparently a single file. of course this does not work for most theses, and it was never implemented widely! so if every file is considered independently, Google's processing will apply to individual files ... hence some may be filtered but others retained. hope im not too much off the mark, and this helps :) ttfn, ----hussein ===================================================================== hussein suleman ~ [email protected] ~ http://www.husseinsspace.com ===================================================================== George Kozak wrote: > Hi... > > We (at Cornell) have discovered a weird problem with Google harvesting > and wonder if anyone else has seen this. One of our collections is > called Cornell Alumni News and it contains PDF versions of archived > editions of our Alumni News Magazine (dating back to the 1800's). Each > of the PDFs contain scanned images with underlining OCR. Each item is > one volume with 12-18 bitstreams (PDFs) which represent an issue. > > We have found that Google has harvested these issues inconsistently. > For instance, for one volume we will find that 5 of the bitstreams > appear in Google but 7 do not. > > Has anyone else seen anything like this? We are wondering if it may be > a product of the size of the PDFs. The ones which weren't harvested > seem consistently large (around 30MB). > > P.S. I did go through the Google WebMaster Tools, but I couldn't find > anything that indicated a problem on their end. > ------------------------------------------------------------------------------ Come build with us! The BlackBerry® Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9-12, 2009. Register now! http://p.sf.net/sfu/devconf _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

