with
us, as we are planning to use skype or similar things to allow distant
participations.
Cheers,
Mathieu d'Aquin (Watson)
Giovanni Tummarello (Sindice)
[1] http://tinyurl.com/3m7ufj (note that the page is editable if you want to
add your view
endpoints/sites?
Cheers,
Peter
2008/5/30 Giovanni Tummarello [EMAIL PROTECTED]:
A validator in sindice is possible and has been discussed but the list
of things to do is now quite scary :-)
poor man validator: plese post us about yout sitemap here
http://forum.sindice.com/index.php . Free report
as a void:example_file)
The rest of the descriptions seem to be allowed for by current
vocabularies such as foaf and dc so the actual specification will be
very highly modular and hence easy to implement and agree on IMO.
Cheers,
Peter
2008/6/12 Giovanni Tummarello [EMAIL PROTECTED]:
Wasnt RDF
-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Giovanni Tummarello
Sent: Thursday, June 12, 2008 12:08 AM
To: Hausenblas, Michael
Cc: public-lod@w3.org; Semantic Web
Subject: The king is dressed in void
Wasnt RDF all aabout being self describing?
if i say giovanni
Hi Michael,
let me clarify that it wasnt really meant to be to: michael you were
just there when i replied to the general idea and not you in
particular :-)
step after semantic sitemaps (it actually is thought to extend it in
terms of using the sc:datasetURI as the entry point, see also
the slicing and sparql graph
parts to describe a data set?
Cheers,
Peter
2008/6/13 Giovanni Tummarello [EMAIL PROTECTED]:
All of your described functionalities are a subset of what semantic
sitemaps are for [1].
Specs aside, the paper [2] might be of interest to some is that we
went to some
need to
dig our hole deeper by showing yet more reinventing.
Giovanni
On Fri, Jun 13, 2008 at 1:30 AM, Peter Ansell [EMAIL PROTECTED] wrote:
2008/6/13 Giovanni Tummarello [EMAIL PROTECTED]:
XML is a step forward. The thing started in RDF with something called
semantic crawling ontology (sorry
Hi Hugh,
as far as Sindice is concerned,please just post your message on
http://forum.sindice.com and we'll be able to follow your data case
closely.
as far as large datasets are concerned, the indexing is currently
manual that is we must personally know of the dataset (e.g. from a
post in the
://sws.geonames.org/2950157/
Actually what we need is a namespace and vocabulary for all those flavors of
URI similarity and equivalence to be used on the Web, diffferent from OWL
and RDFS namespace.
Bernard
Giovanni Tummarello a écrit :
http:... or something equivalent, not a reference http
Hi Jason,
i believe you're persuing exactly the same goal as the Okkam project
(http://okkam.org).
unlike okkam however you have something up alrady at a nice visible,
uncluttered website.
This mail of mine is just so that you know tha tther eis this common
research effort and in fact to say
Just out of the RDF playground for a second,
http://www.evri.com/mainline-ui/jsp/index.jsf
seems to know..
Madonna divorce Guy Ritchie it will give you 5 sources on the
web that say that. (last 3 days..)
Madonna * Guy Ritchie
returns many more things (someof which are noisy etc..) but
Should be possible, same way as google indexes the other pages.
If they get a a semantic sitemap online it would be much better, will
ask for it.
Giovanni
On Wed, Oct 29, 2008 at 10:20 AM, Andreas Langegger [EMAIL PROTECTED] wrote:
ain't that funny?
After Alan's talk last week at WOD-PD we
Hi Jim,
honestly, a count job we launched some time ago gave us a something
less than a billion on Sindice actually (But we currently dont index
uniprot which is a big one). We'll be publishng live stats soon. But
what about wrappers (e.g. flickr wrappers of keyword searches), that's
a
Hi
when people liked to draw maps of the WWW, and these really quickly
disappeared when it got big. I hope that happens to the Data Web, too.
Hopefully soon. But my current estimate is that the Data Web is probably
This has happened already, for the Data Web as in Microformat world
and
dbtune.org provides at least 14 billion triples (see
http://blog.dbtune.org/post/2008/04/02/DBTune-is-providing-131-billion-triples
+ the Musicbrainz D2R server at http://dbtune.org/musicbrainz/, so I
guess you'd need a pretty big phone to aggregate all that :-)
.. thus the problem with
Overall, that's about 17 billion.
IMO considering myspace 12 billion triples as part of LOD, is quite a
stretch (same with other wrappers) unless they are provided by the
entity itself (E.g. i WOULD count in livejournal foaf file on the
other hand, ok they're not linked but they're not less
rdfs:seeAlso links, and by querying the Sindice search engine. The library
Cool this is the original number 1 task sindice was conceived to do,
that is provide the inverse of the seeAlso, the inverse links for
automatic mashups.
Happy to be of use :-) (now all that people have to di is reuse
Hi Misha,
would you have a comparison between this and the google social graph api?
I understand that also follows FOAF links (e.g. see livejournal etc).
I guess they're less specialized however?
Giovanni
On Wed, Nov 26, 2008 at 3:25 PM, [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:
Hello,
Am
i agree on all your comments and believe me by talking to actual web
2.0 people you're way ahead.
i'll try to answer some of your questions
I then asked if they new the value of Linked Data. The answer I got was
well, i would think that my site would be easier to find right? i mean, i
would
- My company has recently released an API for access to structured
(database) data about 55 million companies and 35 million people. Do
you think I should release this in an LOD format? How would my
customers benefit.
could be tricky
usually one such api involves looking up and finind
Yves,
just on the side, yes there is not much dbtune in sindice. just a few
http://sindice.com/search?q=dbtuneqt=term
if you have an RDF dump of the site or of part of it and you express
it in a semantic sitemap you would be indexed full in very short time
. Otherwise we should have the ne
I hope that DBpedia Lookup is useful for you, and I'd appreciate any
feedback.
URI lookup as well as other searches are important,
so to facilitate other LOD dataset providers to also do this i'd
suggest they simply wrap around Sindice and take the first results,
e.g.
/?group_id=227929
WWW 2009 Research Track paper:
Danh Le Phuoc, Axel Polleres, Christian Morbidoni, and Manfred
Hauswirth, Giovanni Tummarello. Rapid semantic web mashup development
through semantic web pipes. In Proceedings of the 18th World Wide Web
Conference (WWW2009), Madrid, Spain, April 2009
Wow. lots of stuff.. how many triples in total then? How many machines
and of which kind? very interested
Giovanni
On Thu, Feb 19, 2009 at 10:38 PM, Kingsley Idehen
kide...@openlinksw.com wrote:
All,
We now have part 1 of the Virtuoso 6.0 Cluster Edition with LOD hosting that
includes:
1.
Forwarding from Robert Fuller, main guy now the primary contac tfor support
on the project.
We have moved broken pipes away, sorry for the problems in the previous
release.
Giovanni
-- Forwarded message --
From: Robert Fuller [DERI] robert.ful...@deri.org
Date: Thu, Feb 26, 2009
congrats and kudos to all those who've made this happen. I think the cloud
diagrams are proving a very compelling visual for people who don't care
about nerdy detail but understand the idea of interlinked datasets.
Yes they're great for handwaving if the audience has never seen it,
otherwise
Hi Andreaz :-)
I don't see the difference between the LOD model and the data (including
links) itself. At least to us at Zemanta it is immensely helpful to have
a lot of those links done. It brings down the cost of doing really
innovative stuff to us and I believe to many others too.
We
Hi Daniel,
the Semantic Sitemap Extention does that well. (also has the imporant task
to tell the world that dbpedia is not 6 million RDF model but a single one
which is split on the fly)
http://sw.deri.org/2007/07/sitemapextension/
http://dbpedia.org/sitemap.xml
Giovanni
On Fri, Mar 6, 2009
Hi
I could query the site for its sitemap extension (would it always be home
url/sitemap.xml? doesn't seem so...), as Giovanni suggests, and see if I
get a result; in the affirmative case, I have to parse it and look for the
sc:sparqlEndpointLocation element.
Sitemaps are either at
I know what's missing :-): any a real application that need to do the
automatic discovery etc and that someone would really want to use, i.e. not
another academic demonstrator.
If there was one such application people would put in the last 10 minutes of
work.
People do at this point go to the
===
New Features
[2009-03-04] Added sesame xpath functions library including concat,
lowercase and uppercase
Example query using fn:concat follows:
PREFIX fn: http://www.w3.org/2005/xpath-functions#
select ?name where {?s ?p ?name .
FILTER ( ?name=fn:concat('Giovanni ','Tummarello
Hi Jamie,
i see that your RDF per URI is more expressive than the usual
instead of just giving triples out of (or into) the subject of the
page you also give the description of other notable entities inside
for example in the blade runner movie you give the full description of
all the film
The only reason to mint resolvable URIs is to allow fetching of a description
i'd say that minting in other's people spaces is really calling for
troubles and should be discouraged? one should, could, possibly put
sameas if some URI exists somewhere else.
honestly? i dont even see the reasonw hy
if its one source, then fine, the source is changed and its indexed
again if it has been copied.. everybody loses, i'd say :-)
Yes, Data Access by Reference is about not having to interact with Data
by Value which requires localization of data in order to actually use the
values :-)
..
The point was this: _if_ you would like your data to be incorporated into
the dogfood site, then it should have dogfood namespace URIs, otherwise we
cannot serve it. We hope to offer people who want to contribute to the site
what about creating local, arbitrary URIs, linked with sameas to the
Hi YVes,
nothing can beat having a semantic sitemap [1]. Basically you say that you
change 1nce a day and give a link to the dump. Done :-)
if you put it i am ready to show in sindice the information updated every
day, and with no other cost for you than a single dump download.
also the sitemap
Forced to mention RDFSync then (ISWC 2007)
Giovanni Tummarello, Christian Morbidoni, Reto Bachmann-Gmür, Orri Erling
RDFSync: efficient remote synchronization of RDF models
http://semanticweb.deit.univpm.it/papers/RDFSyncISWC2007.pdf
there was an implementation but it was just a proof
RDFa will not generally negate the essential separation of Name (via
URI.URN-URL) and Address (via URI.URL) since Linked Data oriented triples
will still contain de-referencable URIs :-)
if you can put the RDF and the human legible HTML version in the same
address there is absolutely no
, May 17, 2009 at 3:08 AM, Peter Ansell ansell.pe...@gmail.com wrote:
2009/5/17 Giovanni Tummarello g.tummare...@gmail.com:
for graphs which use a (specific) FOAF term. It's a bit like
PingTheSemanticWeb or Sindice, but decentralized based on the ontologies
used.
[]
Isnt this like
Hi,
there isnt a single answer unfortunately.
Lets take symetric concise bound descriptions (SCBD) which basically
means from the uri you'll get triples around it recursively until you
find other URIs. (so when you find a blank node you keep on going).
This seems a pretty good way to provide
Hi Dan,
storing (and being able to re-execute) this journey reminds me of the
driving inspiration behind DERI Pipes. Pipes have an underlying XML
representation language which stores the recipie for processing one
or more RDFs.
arbitrary operators can select data out of it, returns another RDF
Cool Hugh :-) great ajaxi thing as well.
if you dont do this already it might make sense to also add
from your page yu say
There is currently no a service to enable arbitrary contribution to
the contents. If you have significant data you would be prepared to
give us, then please conact us at the
a New Zealander and a Kiwifruit)
throws up a radio station, an animated cartoon and lots of wordnet links to a
juggle of plumbing but no juice. No sign of
http://dbpedia.org/resource/Kiwi however
Ah.
We only look at the first n results from Sindice, and clearly kiwi is a
popular name.
On Fri, Jun 12, 2009 at 9:44 AM, Toby Inkstert...@g5n.co.uk wrote:
On Fri, 2009-06-12 at 01:33 +0200, Andraz Tori wrote:
also to note is that there exist proper mappings to other efforts at
tagging ontologies:
http://commontag.org/mappings
The question is though, will Search Monkey, Sindice,
Just a remark about what we're doing in Sindice, for all who want to
be indexed properly by us.
we recursively dereference the properties that are used thus trying to
obtain a closure over the description of the properties that are used.
We also consider OWL imports.
When the recursive fetching
Just RDFa and live happy IMO. A machine doesnt care about the messy
part of the markup. The advantage of a single URL to access it too
much to be a match for anything.
It is a fact that people like us like to look at RDF directly as well.
But it should be a problem to use a firefox plugin to
Martin,
partially you could solve the problem yourself by putting the
owl:import triples in your ontology fragments e.g. the fragment, when
served, says owl import so that you're sure the ontology is used as
a whole..
would this do it? :-) fixing the problem in a single location might
be so much
/, Giovanni
Tummarello http://www.deri.ie/about/team/member/giovanni_tummarello/, Stefan
Decker http://www.deri.ie/about/team/member/stefan_decker/
*Context Dependent Reasoning for Semantic Documents in Sindice.*
In *Proceedings of the 4th International Workshop on Scalable Semantic Web
Knowledge Base
I answer to Toby just becouse its handy to do so but i just want to
make a general statement.
Toby is stating the classical view, clean knowledge representation, 0%
dealing with ambiguity.
Hugh is hinting at is that the complexity of the clean solution is
overwhelming since it is
Dear Web of Data enthusiasts,
we are very happy to share with you today the first public version of
Sigma, http://sig.ma , a browser, a mashup engine and an API for the
web of data.
here is blog post with screencast, sample sigma embedded mashup etc.
/23 Giovanni Tummarello giovanni.tummare...@deri.org:
Dear Web of Data enthusiasts,
we are very happy to share with you today the first public version of
Sigma, http://sig.ma , a browser, a mashup engine and an API for the
web of data.
here is blog post with screencast, sample sigma embedded
Dear Semi Structured Data Enthusiasts,
we are today pleased to announce version 1 of Sparallax
Sparallax is an adaptation of the FreeBase Parallax to use SPARQL endpoints.
Thanks to a proxy and query translation modules (SPARQL to MQL and
results translated back), Sparallax is minimally
Hi Kingsely,
we are a bit unsure about your complaint, please clarify, do you mean
to say that sparallax give that user agent when trying to connect to
an external sparql endpoint? we tried and got the user agent of the
browser. Not sure how it is important to use a user agent instead of
-unibw.org wrote:
Hi Giovanni:
Giovanni Tummarello wrote:
Hi Martin, all,
the sitemap exposed is not a Semantic Sitemap
Semantic Sitemap: http://products.semweb.bestbuy.com/sitemap.xml
but simply gives the location of the dumps.
As far as I see, the sitemap at
http
-unibw.org]
Sent: Tuesday, September 01, 2009 8:14 AM
To: giovanni.tummare...@deri.org
Cc: public-lod@w3.org
Subject: Re: ANN: BestBuy.com starts publishing full catalog as RDF/XML using
GoodRelations - 27 million triples
Hi Giovanni:
Giovanni Tummarello wrote:
Hi Martin, all
*Promotion* :-)
Accessing dbpedia with sparallax http://sparallax.deri.ie
Full announcement at
http://blog.sindice.com/2009/10/12/new-inspector-full-cache-api-all-with-online-data-reasoning/
quotable text:
---
We’re happy to release today 2 distinct yet interplaying features in
Sindice: The Sindice Inspector and the Sindice Cache API (both
including
Kind of make me thing.. we could put it virtually back in the same
place as originally on our Sindice cache [1]
i wonder if the operation make sense.. on the one hand a chace is
usually intended for reflecting reality on the other i'd see obvious
practical advantages.
Maybe we could offer an
With respect to crawling and scraping or sponging or .. trying to
guess based on partial fragments of structured information i can say
3 thngs
a) No, we're not doing it at the moment, we are only covering those
who chose to put structured semantics. Some book stuff shows up in
Sig.ma .. e.g.
I'd say, if i understand well
that that works only for queries where you need the extra dereferenced
data just additionally e.g. to add a label to your result se
if you need the remote, on the fly reference data to e.g. sort by
price you'd have to fetch all from the remote site ..
Gio
On Sun,
Giovanni Tummarello wrote:
With respect to crawling and scraping or sponging or .. trying to
guess based on partial fragments of structured information i can say
3 thngs
a) No, we're not doing it at the moment, we are only covering those
who chose to put structured semantics. Some book stuff
A) The wrapper's Semantic Sitemap points you at the original Sitemap, and
says how it is doing the wrapping. And because you know how the wrapper is
behaving, you can process the standard Sitemap to get the information you
want about what the wrapping site provides.
Actually, the slicing in
- general chair Enrico Motta:
http://data.semanticweb.org/person/enrico-motta (see that is general chair
2009)
- a paper from the research track:
http://data.semanticweb.org/conference/iswc/2009/paper/research/311
- a workshop at ISWC2009:
Wrt this,
i feel like sharing how we address this issue in Sindice and the tools
we provide.
We do materialization at central level following recursively the links
to ontologies e.g. by resolving property names.
This allows data producers to be consideraly more concise in the
markup (e.g. think
Hi Vasily yes, you can use Sindice for that purpose.
either from asking data from the full reasoned cache (ask away ,we
can serve plenty) or from the reasoning API (with a bit of moderation,
it is an intense process although we do have many layers of caching)
a blog post about the details
change or better
reasoning happens or new data etc) + serialization not fully perfomred
automatically would seem an irrealistic
On Wed, Apr 7, 2010 at 12:38 PM, Vasiliy Faronov vfaro...@gmail.com wrote:
Giovanni Tummarello wrote:
In this casematerialization is likely not going to happen much
+1 thanks Nathan for pointing this out, very very relevant.
luckly so far it seems a bit too rooted in MS stack of things (just
looking at it very very superficially) :-)?
Gio
ps: realistically there's the whole microsoft thing to keep in the back
of our minds; they have pretty much a
Hi Leigh
i tell you what we're going to be supporting in Sindice very soon and
it would be great if you could add it to the table:
simple existing sitemaps :-). Sitemaps provide the list of URLs to
crawl and for each one either a last updated field or update
frequerncy.
If the website cares to
sws.geonames URIs, SPARQL endpoint etc. Bearing in mind that Geonames.org
has no dedicated resources for it, who will care of that in a scalable way?
What is the business model? Good questions. Volunteers, step forward :)
Bernard
Hi Bernard, the need to automatically interlink at large
so hang on tight a bit.. we're working on this, just continue
publishing high quality data with good entity descriptions (as much as
you know about YOUR stuff), and the links will come to you just like
that at some point. I promise :)
WOW ... rings a bell ...and all these things will be
Hi all,
A new version of the Sindice frontend with some interesting improvements.
e.g. a realtime data widget on the homepage, and the new API to
restrict to new day documents (or weekly) etc.
http://sindice.com
Also Facebook support for RDFa is making the web now bubble with new triples.
See
For the interested,
within several new EU projects there are now hiring opportunities
available to work on Sindice current and future services: cloud
computing postdoc/researcher, cloud/semantic/integration developers.
Internships also available with possible ph.d continuation.
Good community
Hi there :-) looks very cool.
could you please point us to the specifics of protocol? so we can start
considering integrating in Sindice
Note: we're about to announce (monday?) delta support in Sindice based on
Sitemaps lastmod which seems to be the easiest possible for the HTML+ RDFa
world.
Apologies for cross posting
-
Dear all
So far semantic web search engines and semantic aggregation services have
been inserting datasets by hand or have been based on random walk like
crawls with no data completeness or freshness guarantees.
After quite some work, we are happy to
Jorn you're right.
linked data with plain dereferenciable URIs it plain doesnt work once you
move from the simplest examples. This is for some of the reasons you
mention as well as other others (e.g. how do you really ask what are the
1000 URis most visited (assuming this was in the DB) or the
Only solution for you now is to use SPARQL instead of resolving the URI.
Much less traffic and it would actually work
SPARQL doesn't make the problem go away, it just pushes the limits further
out. SPARQL endpoints that see significant traffic have similar restrictions
built in, either on
Thanks Paul, this sort of feedback is indeed tremeoudly useful,
I somehow just wish you had had 1/10th of the replies of the subjects as
literal thread.:-)
Gio
(obviously we're talking business of LOD at large and the true state of it
despite the growing number of lines in the lod cloud diagram.
Hi Mattihas,
sorry for the delay. it is indeed a possible API which we call
longstanding query or notification api . Not yet available , but
we have many requests for it so it wil come.
my advice at the moment would be to do it yourself client side using
say a DB state and fetching the data from
But again: I agree that crawling the Web of Data and then deriving a dataset
catalog as well as meta-data about the datasets directly from the crawled
data would be clearly preferable and would also scale way better.
Thus: Could please somebody start a crawler and build such a catalog?
As
Hi Ian
no its not needed see this discussion
http://lists.w3.org/Archives/Public/semantic-web/2007Jul/0086.html
pointing to 203 406 or thers..
..but a number of social community mechanisms will activate if you
bring this up, ranging from russian style you're being antipatriotic
criticizing the
I think it's an orthogonal issue to the one RDFa solves. How should I
use RDFa to respond to requests to http://iandavis.com/id/me which is
a URI that denotes me?
hashless?
mm one could be to return HTML + RDFa describing yourself. add a
triple saying http://iandavis.com/id/me
I might be wrong but I dont like it much . Sindice would index it as 2
documents.
http://iandavis.com/2010/303/toucan
http://iandavis.com/2010/303/toucan.rdf
i *really* would NOT want to different URLs resolving to the same thing
thanks
Giovanni
On Fri, Nov 5, 2010 at 10:43 AM, Ian Davis
How about something that's totally independant from HEADER issues?
think normal people here. absolutely 0 interest to mess with headers
and http responses.. absolutely no business incentive to do it.
as a baseline think someone wanting to annotate with RDFa a hand
crafted, apached served html
Bravo Harry :-)
let me also add without adding anythng to the header.. *keeping HTTP
completely outside the picture*
http header are for pure optimization issues, almos networking level.
Caching fetching crawling, nothing to do with semantics.
A conjecture: the right howto document is about 2
Yes Sig.ma heavily checks for properties that are subclass of label
and uses them.
I think sparallax as well.
Gio
On Fri, Nov 12, 2010 at 12:08 PM, Dan Brickley dan...@danbri.org wrote:
Dear all,
The FOAF RDFS/OWL document currently includes the triple
foaf:name rdfs:subPropertyOf
- the rest of the web continue to use 200
Tim
yes but the rest of the web will use 200 also to show what we would
consider 208, e.g.
http://www.rottentomatoes.com/celebrity/antonio_banderas/
see the trilples
Boris would you be able to provide a bit of explanation on why would
you want o do that e.g. what evidence are there (nice use cases) were
an rdf export of low level features in the map is of use
thanks!
Gio
On Mon, Jan 17, 2011 at 2:34 AM, Boris Villazón Terrazas
bvilla...@fi.upm.es wrote:
To the best of my knowledge there isnt anything that one could call
modern, updated out there.
something modern and credible would be actual data + social backed
(votes, comments, etc) . . as said in the past we in Sindice we'd be
delighted to provide the data part if anyone wanted to
sindice.com main index has 37,312,159 documents occurrences of foaf:person.
http://sindice.com/search?q=foaf%3Aperson
(a lot of these come from microformats via the any23 library but anyway)
which means there are many more actual persons inside.
Gio
On Wed, Apr 13, 2011 at 10:15 AM, Bernard
, Apr 13, 2011 at 4:48 PM, Giovanni Tummarello
giovanni.tummare...@deri.org wrote:
sindice.com main index has 37,312,159 documents occurrences of foaf:person.
http://sindice.com/search?q=foaf%3Aperson
(a lot of these come from microformats via the any23 library but anyway)
which means
Hi Frank, my 2c from the Sindice.com point of view.. (as we struggle
to actually make use and make easy for others to use all this)
i wouldn't really worry too much,
just give to the machines what you'd give to humans, that technically
means simply make sure all the pages you display (and that
So, can someone clarify, if possible, whether if I publish a page using RDFa
and schema.rdf.org syntax, it will be properly parsed and indexed in any of
those search engines?
that's all they'd have to say not to piss people off but they decided
not to do it.
didnt cost anything. pretty
my2c
i would seriously advice against using triples with http://schema.rdfs.org .
That would be totally and entirely validating their claim that either
you impose things or fragmentation will distroy everything and that
talking to the community is a waste of time.
For how little this matters
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html
On 9 Jun 2011, at 09:54, Giovanni Tummarello wrote:
my2c
i would seriously advice against using triples with
http://schema.rdfs.org .
That would be totally and entirely validating their claim
My sincere congratulations, i had someone overlooked at this level of
detail needed here.
The choices are pragmatic and - in my personal opinion having talked
directly at SemTech with a lot of people involved in this - should
serve the community as good as possible.
will you be posting this as a
Hi Tim ,
documents per se (a la HTTP response 200 response) on the web are less and
less relevant as opposed to the conceptual entities that are represented
by this document and held e.g. as DB records inside CMS, social networks
etc.
e.g. a social network is about people those are the
This year, the Billion Triple Challenge data set consists of 2 billion
triples. The dataset was crawled during May/June 2011 using a random sample
of URIs from the BTC 2010 dataset as seed URIs. Lots of thanks to Andreas
Harth for all his effort put into crawling the web to compile this
particular confusion is so destructive. Unlike the dogs-vs-bitches case,
the difference between the document and its topic, the thing, is that one is
ABOUT the other. This is not simply a matter of ignoring some
Could it be exactly the other way around? that documents and things
described in
Hi Nicolas,
Its getting in Sindice indeed - quite politely e.g. 1 every 5 secs-
we'll monitor speed and completeness. iff you think its ok for us to
crawl faster please say so via robot.txt directive or just say so
) channels for data publication
over the web, which serve different goals.
Maybe we need to better articulate the practices and expectations, though...
Cheers,
Antoine
Hi Giovanni,
Le 09/07/2011 23:10, Giovanni Tummarello a écrit :
Hi Nicolas,
Its getting in Sindice indeed -
Yes, I
1 - 100 of 157 matches
Mail list logo