Re: Screencast on how to get started developing for stanbol

Rupert Westenthaler Mon, 11 Mar 2013 06:52:11 -0700

On Mon, Mar 11, 2013 at 1:53 PM, Reto Bachmann-Gmür <[email protected]> wrote:
> On Mon, Mar 11, 2013 at 9:49 AM, Rupert Westenthaler <
> [email protected]> wrote:
>
>> On Mon, Mar 11, 2013 at 9:03 AM, Reto Bachmann-Gmür <[email protected]>
>> wrote:
>> > Hi,
>> >
>> > I missed this registation mechanism for the stanbol endpoint. Why can't
>> all
>> > triplecollections in TcManager be queried?
>> >
>>
>> This was a design decision. There was never the intension to publish
>> all (Clerezza) TripleCollections but rather to allow Stanbol
>> components to provide a SPARQL service over their RDF data.
>>
>
> By know we have some triple collections in TcManager thta will typically
> not be queried (system graph and recipy graph) not sure which one you
> wanted to hideaway back then.


In OSGI you never know what other modules are using Clerezza to store
data. The current design ensures that only Graphs that are explicitly
configured graphs are exposed. This assumptions seamed to be a good
one to get started.

>
> What seems a bit strange with the current approach is that persistent
> triple collections are twice on thewhiteboard once added there by TcManager
> and once added there for registering them with the sparql endpoint.
>

Yes using the whiteboard is not an optimal solution, but it is ok for
the current scope of the /sparql endpoint

>
>> With the adaption of the 2-layerd storage solution for the Contenthub
>> (and later also the Entityhub) we might need to rethink the /sparql
>> endpoint to support also the querying of RDF graphs not managed by
>> Clerezza.
>
>
> I don't understand the link to the 2-layered content hub
>

Because than a Entityhub Site can consist of a "Store" and "Semantic
Indexe(s)". Currently the Entityhub typically can not provide SPARQL,
as the SolrYard does not support it, but than you can use a
TripleStore as "Store" and Solr for the "Semantic Index". That means
that the Store could register itself with the SPARQL endpoint. This
means that users will be able to SPARQL all Entityhub Sites, what
would make an SPARQL endpoint much more interesting.

>
>> I have already started an implemented already a native Jena
>> TDB store for the Entityhub in the "contenthub-two-layered-structure"
>>
>
> Why this?

Performance. Using Clerezza means for semantic indexing to convert
"Jena TDB > Jena > Clerezza Graph > Solr InputDocument" while directly
using Jena TDB APIs allows to implement "Jena TDB > Solr
InputDocument". If you index 8 million DBpedia concepts or the ~120
million entities of Musicbrainz this makes a big difference.

best
Rupert

>
> Cheers,
> Reto



--
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Screencast on how to get started developing for stanbol

Reply via email to