hi George

as far as i know, Google does enforce limits on file size, but what it 
is may vary over time ...

from the perspective of electronic theses, a while back NDLTD was 
advised by Google that all theses should be single-PDF documents, as 
that fit in with their model - their digital object unit is apparently a 
single file. of course this does not work for most theses, and it was 
never implemented widely!

so if every file is considered independently, Google's processing will 
apply to individual files ... hence some may be filtered but others 
retained.

hope im not too much off the mark, and this helps :)

ttfn,
----hussein

=====================================================================
hussein suleman ~ [email protected] ~ http://www.husseinsspace.com
=====================================================================


George Kozak wrote:
> Hi...
> 
> We (at Cornell) have discovered a weird problem with Google harvesting 
> and wonder if anyone else has seen this.  One of our collections is 
> called Cornell Alumni News and it contains PDF versions of archived 
> editions of our Alumni News Magazine (dating back to the 1800's).  Each 
> of the PDFs contain scanned images with underlining OCR.  Each item is 
> one volume with 12-18 bitstreams (PDFs) which represent an issue.
> 
> We have found that Google has harvested these issues inconsistently.  
> For instance, for one volume we will find that 5 of the bitstreams 
> appear in Google but 7 do not. 
> 
> Has anyone else seen anything like this?  We are wondering if it may be 
> a product of the size of the PDFs.  The ones which weren't harvested 
> seem consistently large (around 30MB). 
> 
> P.S.  I did go through the Google WebMaster Tools, but I couldn't find 
> anything that indicated a problem on their end.
> 

------------------------------------------------------------------------------
Come build with us! The BlackBerry® Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9-12, 2009. Register now!
http://p.sf.net/sfu/devconf
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to