Re: [DBpedia-discussion] DBpedia download vs DBPedia SPARQL data

Sebastian Hellmann Mon, 03 Jun 2019 22:48:45 -0700

Hi Denny,

On 03.06.19 23:31, Denny Vrandečić wrote:

Thank you a lot for the answer, that is super useful.
I'll see if I can get the canonicalized version recreated :)
One question though, is there a cleaned version of the DBpediaontology mapping based data? I only found the uncleaned version.

Two scripts are missing: Type consistency check and Redirect resolution.We will do them after the Unicode bug.

Do you have any plans when the next release of DBpedia is going to beavailable?

These are signed with the public key fromhttp://webid.dbpedia.org/webid.ttl so they are the next releases. Thestructure will stay the same and each release should be a bit betterthan the previous one. We are just working on the handful of issues anda better way to comment on mistakes before we announce them on allchannels.

It is an open platform now. We will have core dataset releases(including raw data) and then the community can create their own additions.

https://propi.github.io/webid.ttl will add the LHD dataset and HeikoPaulheim DBkWik, etc.

If you do any analysis, you can get an account and publish the data onthe bus. Links like https://databus.dbpedia.org/denny/analysis arestable redirects to files, just like purl.org.



-- Sebastian

On Mon, Jun 3, 2019 at 2:19 PM Sebastian Hellmann<hellm...@informatik.uni-leipzig.de<mailto:hellm...@informatik.uni-leipzig.de>> wrote:


    Hi Denny,

    you didn't find them really, because they are not yet publicly
    released. Please see them as a beta.

    The main reason is, that there are a handful of missing features
    and a handful of stupid bugs.

    One example:

    - we discovered a unicode issue in URIs which still allows valid
    analysis, but would not allow to load it into dbpedia.org/sparql
    <http://dbpedia.org/sparql>

    - we built the Databus to have a group changelog and a
    dataset/artifact changelog, however, these can only be changed at
    release time, so we can not update reported errors after it was
    published, like the one above.

    It is not hard and marvin did new extractions already:
    https://databus.dbpedia.org/marvin , there is just a bit missing.

    i.e. files such as
    
http://downloads.dbpedia.org/2016-10/core-i18n/de/mappingbased_objects_wkd_uris_de.ttl.bz2
    - can you point me where I can find the canonicalized versions in
    the new files?


    These are discontinued. Instead there is:

    https://databus.dbpedia.org/dbpedia/id-management/global-ids
    loaded into this webservice:
    
https://global.dbpedia.org/same-thing/lookup/?uri=http://www.wikidata.org/entity/Q8087
    where you can resolve many URIs against clusters.

    and the fused and enriched versions as described in
    https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf

    Flexifusion is more systematic and can rewrite any datasetś
    subject with any other subject from the ID management. So we could
    produce these datasets any way.

    Thanks for these pointers! I have run a few analyses, and now can
    rerun them again with the actual current data :) I expect this to
    improve DBpedia numbers by quite a bit.

    You could also try the fused version:
    https://databus.dbpedia.org/dbpedia/fusion This is the one we are
    working on most and will aggregate a lot more data in the future.

    I find it all a bit hard to navigate (although Databus has a few
    really neat features, thanks for that).


    Any feedback welcome, the issue tracker is linked on top of the
    website.


    Yes, another missing feature. However, we thought that the pros
    will just look at the dataid files and then write sparql queries
    at https://databus.dbpedia.org/yasgui/

    -- Sebastian


    On 03.06.19 19:49, Denny Vrandečić wrote:

Oh, wow, thanks Sebastian, thanks Kingsley for the answers!

I was entirely unaware of the DBpedia datasets over at
databus.dbpedia.org <http://databus.dbpedia.org> - when I search
for "dbpedia downloads" that's not where I get to. Also, when I
go to dbpedia.org <http://dbpedia.org> and then click on
"Downloads", I get to the 2016 datasets.

https://wiki.dbpedia.org/Datasets

https://wiki.dbpedia.org/develop/datasets

I honestly thought, that the 2016 dataset is the latest one, and
was rather disappointed. Thank you for showing me that I was just
looking in the wrong place - but I would really suggest that you
update your Websites to point to databus. I am sure I am not the
only one who believes that there has been no DBpedia update since
2016.

Thanks for these pointers! I have run a few analyses, and now can
rerun them again with the actual current data :) I expect this to
improve DBpedia numbers by quite a bit.

One question, I liked to use the canonicalized versions from here
https://wiki.dbpedia.org/downloads-2016-10, i.e. files such as

http://downloads.dbpedia.org/2016-10/core-i18n/de/mappingbased_objects_wkd_uris_de.ttl.bz2
- can you point me where I can find the canonicalized versions in
the new files? I find it all a bit hard to navigate (although
Databus has a few really neat features, thanks for that).

Cheers,
Denny

On Sat, Jun 1, 2019 at 9:43 AM Kingsley Idehen
<kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:

On 6/1/19 5:45 AM, Sebastian Hellmann wrote:


        Hi Denny,

        * the old system was like this:

        we load from here: http://downloads.dbpedia.org/2016-10/core/

        metadata is in
        http://downloads.dbpedia.org/2016-10/core/2016-10_dataid_core.ttl
        with void:sparqlEndpoint <http://dbpedia.org/sparql>
        <http://dbpedia.org/sparql> ;


        Hi Sebastian,


        I will also have the TTL referenced above loaded to a named
        graph so that it becomes accessible from the query solution I
        shared in my prior post.


        * the new system is here: https://databus.dbpedia.org/dbpedia

        There are 6 new releases and the metadata is in the endpoint
        https://databus.dbpedia.org/repo/sparql

        Once the collection saving feature  is finished, we will
        build a collection of datasets on the bus, which will then
        be loaded. It is basically a sparql query retrieving the
        downloadurls like this:

        http://dev.dbpedia.org/Data#example-application-virtuoso-docker


        Okay.

        Please install the Faceted Browser so that URIs like
        http://dev.dbpedia.org/Data#example-application-virtuoso-docker
        can also be looked up.

        As an aside, here's an Entity Type overview query results
        page
        
<https://databus.dbpedia.org/repo/sparql?default-graph-uri=&query=SELECT+%28SAMPLE%28%3Fs%29+AS+%3Fsample%29+%28COUNT%281%29+AS+%3Fcount%29++%28%3Fo+AS+%3FentityType%29%0D%0AWHERE+%7B%0D%0A++++++++%3Fs+a+%3Fo.+%0D%0A%09%09FILTER+%28isIRI%28%3Fs%29%29+%0D%0A++++++++++++++++FILTER+%28%21+contains%28str%28%3Fs%29%2C%22virt%22%29%29%0D%0A++++++%7D+%0D%0AGROUP+BY+%3Fo%0D%0AORDER+BY+DESC+%28%3Fcount%29&format=text%2Fhtml&timeout=0&debug=on>
        for future use etc..


        Kingsley




        On 31.05.19 21:59, Denny Vrandečić wrote:

        Thank you for the answer!

        I don't see how the query solution page that you linked
        indicates that this is the English Wikipedia extraction.
        Where does it say that? How can I tell? I am trying to
        understand, thanks.

        Also, when I download the set of English extractions from here,

        http://downloads.dbpedia.org/2016-10/core-i18n/en/

        particularly this one,

        
http://downloads.dbpedia.org/2016-10/core-i18n/en/mappingbased_objects_en.ttl.bz2


        it is only about 17,467 people with parents, not 20,120, so
        that dataset seems out of sync with the one in the SPARQL
        endpoint.

        I am curious where do you load the dataset from?

        Thank you!


        On Fri, May 31, 2019 at 11:49 AM Kingsley Idehen
        <kide...@openlinksw.com <mailto:kide...@openlinksw.com>> wrote:

            On 5/31/19 2:23 PM, Denny Vrandečić wrote:

            When I query the dbpedia.org/sparql
            <http://dbpedia.org/sparql> endpoint asking for "how
            many people with a parent do you know?", i.e. select
            (count (distinct ?p) as ?c) where { ?s dbo:parent ?o
            }, I get as the answer 20,120.

            Where among the Downloads on
            wiki.dbpedia.org/downloads-2016-10
            <http://wiki.dbpedia.org/downloads-2016-10> can I find
            the dataset that the SPARQL endpoint actually serves?
            Is it the English Wikipedia-based "Mappingbased" one?
            Or is t the "Infobox Properties Mapped"?

            Cheers,
            Denny


            The query solution page
            
<http://dbpedia.org/sparql?default-graph-uri=&query=prefix+dbo%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E+%0D%0A%0D%0Aselect+%3Fg+%28count+%28distinct+%3Fs%29+as+%3Fc%29%0D%0Awhere+%7B+%0D%0A+++++++%0D%0A+++++++++graph+%3Fg+%7B%3Fs+dbo%3Aparent+%3Fo.%7D%0D%0A%0D%0A+++++%7D%0D%0Agroup+by+%3Fg&format=text%2Fhtml&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on&run=+Run+Query+>
            indicates this is the English Wikipedia dataset. That's
            what we've always loaded into the Virtuoso instance
            from which DBpedia Linked Data and its associated
            SPARQL endpoint are deployed.

--Regards,


            Kingsley Idehen     
            Founder & CEO
            OpenLink Software
            Home Page:http://www.openlinksw.com
            Community Support:https://community.openlinksw.com
            Weblogs (Blogs):
            Company Blog:https://medium.com/openlink-software-blog
            Virtuoso Blog:https://medium.com/virtuoso-blog
            Data Access Drivers 
Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

            Personal Weblogs (Blogs):
            Medium Blog:https://medium.com/@kidehen
            Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/
                           http://kidehen.blogspot.com

            Profile Pages:
            Pinterest:https://www.pinterest.com/kidehen/
            Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
            Twitter:https://twitter.com/kidehen
            Google+:https://plus.google.com/+KingsleyIdehen/about
            LinkedIn:http://www.linkedin.com/in/kidehen

            Web Identities (WebID):
            
Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
                     
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

            _______________________________________________
            DBpedia-discussion mailing list
            DBpedia-discussion@lists.sourceforge.net
            <mailto:DBpedia-discussion@lists.sourceforge.net>
            https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



        _______________________________________________
        DBpedia-discussion mailing list
        DBpedia-discussion@lists.sourceforge.net  
<mailto:DBpedia-discussion@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

--All the best,

        Sebastian Hellmann

        Director of Knowledge Integration and Linked Data
        Technologies (KILT) Competence Center
        at the Institute for Applied Informatics (InfAI) at Leipzig
        University
        Executive Director of the DBpedia Association
        Projects: http://dbpedia.org, http://nlp2rdf.org,
        http://linguistics.okfn.org,
        https://www.w3.org/community/ld4lt
        <http://www.w3.org/community/ld4lt>
        Homepage: http://aksw.org/SebastianHellmann
        Research Group: http://aksw.org


        _______________________________________________
        DBpedia-discussion mailing list
        DBpedia-discussion@lists.sourceforge.net  
<mailto:DBpedia-discussion@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

--Regards,


        Kingsley Idehen 
        Founder & CEO
        OpenLink Software
        Home Page:http://www.openlinksw.com
        Community Support:https://community.openlinksw.com
        Weblogs (Blogs):
        Company Blog:https://medium.com/openlink-software-blog
        Virtuoso Blog:https://medium.com/virtuoso-blog
        Data Access Drivers 
Blog:https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

        Personal Weblogs (Blogs):
        Medium Blog:https://medium.com/@kidehen
        Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/
                       http://kidehen.blogspot.com

        Profile Pages:
        Pinterest:https://www.pinterest.com/kidehen/
        Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen
        Twitter:https://twitter.com/kidehen
        Google+:https://plus.google.com/+KingsleyIdehen/about
        LinkedIn:http://www.linkedin.com/in/kidehen

        Web Identities (WebID):
        Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
                 
:http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

        _______________________________________________
        DBpedia-discussion mailing list
        DBpedia-discussion@lists.sourceforge.net
        <mailto:DBpedia-discussion@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



    _______________________________________________
    DBpedia-discussion mailing list
    DBpedia-discussion@lists.sourceforge.net  
<mailto:DBpedia-discussion@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

--All the best,

    Sebastian Hellmann

    Director of Knowledge Integration and Linked Data Technologies
    (KILT) Competence Center
    at the Institute for Applied Informatics (InfAI) at Leipzig University
    Executive Director of the DBpedia Association
    Projects: http://dbpedia.org, http://nlp2rdf.org,
    http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
    <http://www.w3.org/community/ld4lt>
    Homepage: http://aksw.org/SebastianHellmann
    Research Group: http://aksw.org

--
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT)Competence Center

at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association

Projects: http://dbpedia.org, http://nlp2rdf.org,http://linguistics.okfn.org, https://www.w3.org/community/ld4lt<http://www.w3.org/community/ld4lt>

Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org

_______________________________________________
DBpedia-discussion mailing list
DBpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [DBpedia-discussion] DBpedia download vs DBPedia SPARQL data

Reply via email to