Re: [Dbpedia-discussion] Like DBpedia? You'll love :BaseKB

Paul A. Houle Fri, 13 Apr 2012 07:28:34 -0700

  On 4/13/2012 6:35 AM, baran_H wrote:
> a.) Do you see local installations only as a temporary solution
> until public SPARQL endpoints get more powerful and cheaper in
> the future?
     I think the pendulum will swing to and away from "the cloud" and I 
think there's a place for everything.


     Most advancements in hardware and software will help both local 
installations and public endpoints.  The one thing that could help 
public endpoints would be operating as distributed main-memory 
databases,  but for the economics of that to work out you need very high 
query volume.

> b.)Or do you seriously think on a general Linked Data concept
> basing more and more on downloaded RDF-datasets and locally
> installed SPARQL points with all consequences, e.g. conceptually
> and heavily constraining the potential of web-wide querying
> crowds?
>
       It comes down to the "proof" and "trust" parts of the Linked Data 
stack.  Even if we don't have fully automated answers for these,  the 
fact is that different linked data sources operate in different p.o.v.'s 
and that to maintain a system p.o.v. you need to decide what you "trust" 
to what extent.

       If there's a particular piece of the Linked Data web that's well 
behaved and well understood you can build a simple app that exploits it.

       In general the Linked Data web is a wild and wooly place and you 
need to do some data cleanup before you can write queries.  So you need 
a system like Sindice which builds a knowledge base (in their case 
50*10^9 triples) from a crawl.

       I see Linked Data as being more like a conversation between 
humans than a conversation between neurons.  An agent working in this 
space needs to have some ability to ground terms,  which means having 
either a 10^8+ triple 'generic database' or a beyond-state-of-the-art 
upper ontology of some kind.
> c.)Would public SPARQL endpoints 'in the final analysis' much
> more powerful and cheaper if they are (forget SQL completely)
> implemented with a small subset of SPARQL which allows only a
> class of fundamental and relative simple queries, algorithmic
> optimized for high performance, supported by various indexing
> methods etc.?
>
        Of course.

        There's a lot of room for specialized techniques.

        Not long ago I'd figured out a really interesting calculation 
that could be expressed in SPARQL.  Now,  running this SPARQL query for 
all of the terms in our knowledge base would have taken 100 years but I 
had just 2 weeks to deliver a product.  With more hardware and a 
different triple store,  maybe I could have done it in 10 years or 5 
years (... and I would have blown the schedule simply negotiating for 
the software license with my boss and the vendor)

        Instead I developed a specialized algorithm that did the 
calculation in 24 hours.

        To go back to Sindice,  they developed a framework for building 
a full-text index out of RDF data while bypassing the triple store.  
It's fast and very scalable.

        I've played up the option of loading :BaseKB into a triple store 
because it really is straightforward,  flexible and a lot of fun.  
However,  you can do really amazing things with this kind of RDF dump 
without the triple store,  even on a 32-bit machine.

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Like DBpedia? You'll love :BaseKB

Reply via email to