On 4/13/2012 6:35 AM, baran_H wrote:
> a.) Do you see local installations only as a temporary solution
> until public SPARQL endpoints get more powerful and cheaper in
> the future?
I think the pendulum will swing to and away from "the cloud" and I
think there's a place for everything.
Most advancements in hardware and software will help both local
installations and public endpoints. The one thing that could help
public endpoints would be operating as distributed main-memory
databases, but for the economics of that to work out you need very high
query volume.
> b.)Or do you seriously think on a general Linked Data concept
> basing more and more on downloaded RDF-datasets and locally
> installed SPARQL points with all consequences, e.g. conceptually
> and heavily constraining the potential of web-wide querying
> crowds?
>
It comes down to the "proof" and "trust" parts of the Linked Data
stack. Even if we don't have fully automated answers for these, the
fact is that different linked data sources operate in different p.o.v.'s
and that to maintain a system p.o.v. you need to decide what you "trust"
to what extent.
If there's a particular piece of the Linked Data web that's well
behaved and well understood you can build a simple app that exploits it.
In general the Linked Data web is a wild and wooly place and you
need to do some data cleanup before you can write queries. So you need
a system like Sindice which builds a knowledge base (in their case
50*10^9 triples) from a crawl.
I see Linked Data as being more like a conversation between
humans than a conversation between neurons. An agent working in this
space needs to have some ability to ground terms, which means having
either a 10^8+ triple 'generic database' or a beyond-state-of-the-art
upper ontology of some kind.
> c.)Would public SPARQL endpoints 'in the final analysis' much
> more powerful and cheaper if they are (forget SQL completely)
> implemented with a small subset of SPARQL which allows only a
> class of fundamental and relative simple queries, algorithmic
> optimized for high performance, supported by various indexing
> methods etc.?
>
Of course.
There's a lot of room for specialized techniques.
Not long ago I'd figured out a really interesting calculation
that could be expressed in SPARQL. Now, running this SPARQL query for
all of the terms in our knowledge base would have taken 100 years but I
had just 2 weeks to deliver a product. With more hardware and a
different triple store, maybe I could have done it in 10 years or 5
years (... and I would have blown the schedule simply negotiating for
the software license with my boss and the vendor)
Instead I developed a specialized algorithm that did the
calculation in 24 hours.
To go back to Sindice, they developed a framework for building
a full-text index out of RDF data while bypassing the triple store.
It's fast and very scalable.
I've played up the option of loading :BaseKB into a triple store
because it really is straightforward, flexible and a lot of fun.
However, you can do really amazing things with this kind of RDF dump
without the triple store, even on a 32-bit machine.
------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion