Nathan wrote: > Dan Brickley wrote: > >> (trimming cc: list to LOD and DBPedia) >> >> On Wed, Apr 14, 2010 at 7:09 PM, Kingsley Idehen <[email protected]> >> wrote: >> >> >>> My comment wasn't a "what is DBpedia?" lecture. It was about clarifying >>> the crux of the matter i.e., bandwidth consumption and its effects on >>> other DBpedia users (as well as our own non-DBpedia related Web properties). >>> >> (Leigh) >> >>>> I was just curious about usage volumes. We all talk about how central >>>> dbpedia is in the LOD cloud picture, and wondered if there was any >>>> publicly accessible metrics to help add some detail that. >>>> >>>> >>> Well here is the critical detail: people typically crawl DBpedia. They >>> crawl it more than any other Data Space in the LOD cloud. They do so >>> because DBpedia is still quite central to to the burgeoning Web of >>> Linked Data. >>> >> Have you considered blocking DBpedia crawlers more aggressively, and >> nudging them to alternative ways of accessing the data? While it is a >> shame to say 'no' to people trying to use linked data, this would be >> more saying 'yes, but not like that...'. >> >> >>> When people aren't crawling, they are executing CONSTRUCTsor DESCRIBEs >>> via SPARQL, which is still ultimately Export from DBpedia and Import to >>> my data space mindset. >>> >> That's useful to know, thanks. Do you have the impression that these >> folk are typically trying to copy the entire thing, or to make some >> filtered subset (by geographical view, topic, property etc). Can >> studying these logs help provide different downloadable dumps that >> would discourage crawlers? >> >> >>> That's as simple and precise as this matter is. >>> >>> From a SPARQL perspective, DBpedia is quite microscopic, its when you >>> factor in Crawler mentality and network bandwith that issues arise, and >>> we deliberately have protection in place for Crawlers. >>> >> Looking at http://wiki.dbpedia.org/OnlineAccess#h28-14 I don't see >> anything discouraging crawlers. Where is the 'best practice' or >> 'acceptable use' advice we should all be following, to avoid putting >> needless burden on your servers and bandwidth? >> >> As you mention, DBpedia is an important and central resource, thanks >> both to the work of the Wikipedia community, and those in the DBpedia >> project who enrich and make available all that information. It's >> therefore important that the SemWeb / Linked Data community takes care >> to remember that these things don't come for free, that bills need >> paying and that de-referencing is a privilege not a right. If there >> are things we can do as a technology community to lower the cost of >> hosting / distributing such data, or to nudge consumers of it in the >> direction of more sustainable habits, we should do so. If there's not >> so much the rest of us can do but say 'thanks!', ... then, ...er, >> 'thanks!'. Much appreciated! >> >> Are there any scenarios around eg. BitTorrent that could be explored? >> What if each of the static files in http://dbpedia.org/sitemap.xml >> were available as torrents (or magnet: URIs)? I realise that would >> only address part of the problem/cost, but it's a widely used >> technology for distributing large files; can we bend it to our needs? >> >> > > I'd like to add; could the /data/* and /page/* resources all be made > static files? (if they are not already) + make use of http caching etc. >
Yes. > perhaps even the non-sparql dependant parts could be hosted on another > machine purely for static content? perhaps an interim proxy which > cache's said resources permanently (then cache rebuild on request when a > new dataset is upgraded) > Yes. Kingsley > regards! > > -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
