I was always eager to know whether it's possible to improve Dbpedia parsers
and mappings by comparing the data between Dbpedia and Freebase. Are there
any experiments in this direction?
-----
Yury Katkov, WikiVote
On Wed, Apr 10, 2013 at 4:01 AM, Paul A. Houle <[email protected]> wrote:
> PRESS RELEASE
>
> Paul Houle, Ontology2 founder, stated that "we updated Infovore to accept
> data from DBpedia, and ran a head to head test, in terms of RDF validity,
> between Freebase and DBpedia Live."
>
> "Unlike most scientific results", he said, "these results are repeatable,
> because you can reproduce them yourself with Infovore 1.1. I encourage you
> to use this tool to put other RDF data sets, large and small, to the
> test."
>
> The tool parallelSuperEyeball was run against both the 2013-03-31 Freebase
> RDF dump and the 2012-04-30 edition of DBpedia Live.
>
> Although Freebase asserts roughly 1.2 billion facts, Infovore rejects
> roughly
> 200 million useless facts in pre-filtering. Downstream of that we found
> 944,909,025 valid facts and than 66,781,906 invalid facts, in addition to
> 5 especially malformed facts.
>
> This is a serious regression compared to the 2013-01-27 RDF dump, in which
> only about 13 million invalid triples were discovered. The main cause of
> the
> increase is the introduction of 40 million or so "triples" lacking an
> object
> connected with the predictate ns:common.topic.notable_for. Previously,
> the bulk of the invalid triples were incorrectly formatted dates.
>
> The rate of invalid triples in Dbpedia Live was found to be orders of
> magnitude
> less than Freebase.
>
> Only 8,664 invalid facts were found in DBpedia Live, compared to
> 247,557,030
> valid facts. The predominant problem in DBpedia Live turned out to be
> noncomfortmant IRIs that came in from Wikipedia. This is comparable in
> magnitude to the number of facts found invalid in the old Freebase quad
> dump
> in the process of creating :BaseKB Pro.
>
> Just one of the tools included with Infovore, parallelSuperEyeball is an
> industrial strength RDF validator that uses streaming processing and the
> Map/Reduce
> paradigm to attain nearly perfect parallel speedup at many tasks on common
> four core computers. Infovore 1.1 brings many improvements, including a
> threefold
> speedup of parallelSuperEyeball and the new Infovore shell.
>
> Please take a look at our github project at
>
> https://github.com/paulhoule/infovore/wiki
>
> and feel free to fork or star it. Note that many infovore data products
> are
> also available at
>
> http://basekb.com/
>
> Because infovore is memory efficient, it is possible to use it to handle
> much
> large data sets than can be kept in a triple store on any given hardware.
> The
> main limitation in handling large RDF data sets is running out of disk
> space,
> which it can do quickly by avoiding random access I/O.
>
> "We challenge RDF data providers to put their data to the test", said Paul
> Houle, "Today it's an expectation that people and organizations publish
> only
> valid XML files, and the publication of superParallelEyeball is a step to
> a world that speaks valid RDF and that can clean and repair invalid files."
>
> Ontology2 is a privately held company that develops web sites and data
> products
> based on Freebase, DBpedia, and other sources. Contact
> [email protected] with
> questions about Ontology2 products and services.
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion