On 2/5/14 5:56 PM, Kevin wrote:

Kingsley,

I would like to make some final clarifications before I begin. This is a big time consuming decision so I want to try to choose wisely.

-Should I transform only the portion of my database (half) that is going to need provenance reification?


Yes, but when you say database, this really boils down to the named graphs in the quad store associated with the RDF statements that you want to describe (or reify).

Unfortunately, it would seem you should do all or nothing so there is a consistent way to construct a query.


Scope your CONSTRUCT to the relevant named graphs, as per my comments above. You could even tweak the code of our bulk loader to include descriptions all RDF statements that it loads. Longer term, the loader will include this option, we just haven't got round to adding this feature, due to other technical priorities.

-With a reification style database you basically only query four predicates (IS-A, HAS-SUBJECT, HAS-PREDICATE, HAS-OBJECT), which is the primary index used by the default indexing scheme. Normally Virtuoso caches many triples corresponding to the queries predicate, which is futile with only 4 predicates. This fact alone seems to have an adverse performance effect on future queries.


You are just adding more statements to the DBMS which has optimizations as the physical storage (columnar), working set (key compression), and various index schemes which are all aimed at dealing with very large amounts of RDF statements.

-Do you strongly suggest against putting the ID inside the graph name for some reason?


Use whatever identifier works for you, these reification statements can be associated with their named graphs too. In addition, you could put all the named graphs holding this kind of metadata in a named graph group too.

Basically make the graph name the unique ID. When required to group a portion of the 1 million named graphs, I would have to use a SPARQL FILTER to text search for "my_uri1_*" to get all the <my_uri1_<id>>. Is this whole concept a bad idea for some reason in comparison to the reification approach?


Yes. Just use reification statements associated with a metadata named graph and/or graph groups.

-Lastly should I forget about trying to find a unique ID per triple in the Virtuoso architecture?


I don't see why you would venture down this path as you ultimately need this kind of metadata to be expressed using RDF statements too. The Virtuoso DBMS engine (v 7.x) is all about dealing with large amounts of RDF statements, in scalable fashion.

Kingsley

Regards,

Kevin

*From:*Kingsley Idehen [mailto:kide...@openlinksw.com]
*Sent:* Wednesday, February 5, 2014 4:18 PM
*To:* virtuoso-users@lists.sourceforge.net
*Subject:* Re: [Virtuoso-users] Require Unique integer ID for each RDF Triple

On 2/5/14 4:26 PM, Kevin wrote:

    Kingsley,

    Thank you for personally taking the time and your insightful
    explanation.  Without reification (Adding 4x triples - Ouch) is
    there any hope of getting a unique integer index per triple (See
    my initial ideas)?  Currently I can put the ID in the graph name
    (ie. <my_iri_<id>>, but it really destroys the coarse grain intent
    of graph names and graph groups.

    I assume your reification solution would transform ROVER IS-A DOG
    into:

    ID1 IS-A STATEMENT

    ID1 HAS-SUBJECT ROVER

    ID1 HAS-PREDICATE IS-A

    ID1 HAS-OBJECT DOG


Yes, that's basically the intent of RDF reification vocabulary.

While this may be my best solution, do you agree this suffers from the following:

-Triple count and thus memory is increased by ~4x.


Not if the DBMS engine has key compression and column-wise storage, which is basically a feature of Virtuoso 7.x.


-Performance and caching is hurt due to the more complex queries (More Predicates).


Not so, if you have column-wise storage, key compression, and vectorized execution of queries, all of which are Virtuoso 7.x features.


-Finally only half my triples require provenance reification resulting an ugly hybrid (normal and reified)


Not really, I suggest you try this with Virtuoso 7.x :-)


Kingsley

Regards,

Kevin

*From:*Kingsley Idehen [mailto:kide...@openlinksw.com]
*Sent:* Wednesday, February 5, 2014 2:38 PM
*To:* virtuoso-us...@lists..sourceforge.net <mailto:virtuoso-users@lists.sourceforge.net> *Subject:* Re: [Virtuoso-users] Require Unique integer ID for each RDF Triple

On 2/5/14 2:57 PM, Kingsley Idehen wrote:

    On 2/5/14 2:17 PM, Kevin wrote:

        Virtuoso Fans,

        For a year I have really needed Virtuoso to provide a unique
        integer ID for each RDF triple. In other words, I would like
        Virtuoso to store SPOGI (Like AllegroGraph) instead of just
        SPOG. Often people question if I really need the index.  It is
        essentially for numerous reasons including storing unique
        information (meta-data) about each triple (i.e. time-stamp)
        and allowing full utilization of the Yago2 database.  The
        Yago2 <http://www2007.org/papers/paper391.pdf> database
        (Search "fact identifier") and AllegroGraph
        
<http://franz.com/agraph/support/documentation/current/triple-index.html>
        triple store have embraced this meta data concept, as it
        unleashes some powerful concepts.  If a Virtuoso trick can be
        found to provide said index I think a complete Semantic Web
        solution will be born.

        My current approach is to have unique named graphs on each
        triple.  While this solution partially works it hurts
        performance and it feels like an ugly hack.  In addition it
        makes it hard to utilize the graph names in a coarse grain
        manner as intended.  You can group graphs to achieve larger
        categories, but isn't a million graphs in a GraphGroup
        impractical?

        Can someone with internal Virtuoso knowledge devise a way to
        get a unique integer ID per triple?  Perhaps a way exist to
        access the row-id of the underlying RDB?  Maybe the indexing
        scheme be augmented in any way to yield the index I am
        seeking?  As a crazy last resort can the Virtuoso Open Source
        code base be altered to provide the index ID?

        Regards,

        Kevin


    Kevin,

    You are asking for reification of triples stored in Virtuoso.
    Nothing stops you generating the reified triples right now, bar
    processing time.

    All you do is forward-chain over all the triples creating new
    relations that associate each triple with an rdf:Statement [1]
    i.e., a rdf:subject [2], rdf:predicate [3], and rdf:object [4]
    relation per triple.

    Today, you can LOAD YAGO's reified triples into Virtuoso, we
    tested that a long time ago. We even do that with Uniprot [5].

    Links:

    1. http://www.w3.org/TR/rdf-schema/#ch_statement -- about RDF
    Statement entity type
    2. http://www.w3.org/TR/rdf-schema/#ch_subject -- about RDF
    subject relation
    3. http://www.w3.org/TR/rdf-schema/#ch_predicate -- about RDF
    predicate relation
    4. http://www.w3.org/TR/rdf-schema/#ch_object -- about RDF object
    relation.
    5. http://bit.ly/W8MYMj -- Uniprot reified statements example
    (using the 50 billion+ RDF statements LOD Cloud Cache).

    Kingsley


Kevin,

In regards to #5 you can use: http://lod.openlinksw.com/c/F35JIZE. The original's timeout setting is to low, this one is set to 60 seconds.



--
Regards, Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web:http://www.openlinksw.com
Personal Weblog:http://www.openlinksw.com/blog/~kidehen  
<http://www.openlinksw.com/blog/%7Ekidehen>
Twitter Profile:https://twitter.com/kidehen
Google+ Profile:https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile:http://www.linkedin.com/in/kidehen
------------------------------------------------------------------------

<http://www.avast.com/>

        

This email is free from viruses and malware because avast! Antivirus <http://www.avast.com/> protection is active.





------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk




_______________________________________________
Virtuoso-users mailing list
virtuoso-us...@lists..sourceforge.net  
<mailto:Virtuoso-users@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/virtuoso-users




--
Regards, Kingsley Idehen
Founder & CEO
OpenLink Software
Company Web:http://www.openlinksw.com
Personal Weblog:http://www.openlinksw.com/blog/~kidehen  
<http://www.openlinksw.com/blog/%7Ekidehen>
Twitter Profile:https://twitter.com/kidehen
Google+ Profile:https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile:http://www.linkedin.com/in/kidehen

------------------------------------------------------------------------
<http://www.avast.com/>   

This email is free from viruses and malware because avast! Antivirus <http://www.avast.com/> protection is active.




------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk


_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


--

Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen




Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to