I was always eager to know whether it's possible to improve Dbpedia parsers
and mappings by comparing the data between Dbpedia and Freebase. Are there
any experiments in this direction?
-----
Yury Katkov, WikiVote



On Wed, Apr 10, 2013 at 4:01 AM, Paul A. Houle <[email protected]> wrote:

>   PRESS RELEASE
>
> Paul Houle,  Ontology2 founder,  stated that "we updated Infovore to accept
> data from DBpedia,  and ran a head to head test,  in terms of RDF validity,
> between Freebase and DBpedia Live."
>
> "Unlike most scientific results",  he said,  "these results are repeatable,
> because you can reproduce them yourself with Infovore 1.1.  I encourage you
> to use this tool to put other RDF data sets,  large and small,  to the
> test."
>
> The tool parallelSuperEyeball was run against both the 2013-03-31 Freebase
> RDF dump and the 2012-04-30 edition of DBpedia Live.
>
> Although Freebase asserts roughly 1.2 billion facts,  Infovore rejects
> roughly
> 200 million useless facts in pre-filtering.  Downstream of that we found
> 944,909,025 valid facts and than 66,781,906 invalid facts,  in addition to
> 5 especially malformed facts.
>
> This is a serious regression compared to the 2013-01-27 RDF dump,  in which
> only about 13 million invalid triples were discovered.  The main cause of
> the
> increase is the introduction of 40 million or so "triples" lacking an
> object
> connected with the predictate ns:common.topic.notable_for.  Previously,
> the bulk of the invalid triples were incorrectly formatted dates.
>
> The rate of invalid triples in Dbpedia Live was found to be orders of
> magnitude
> less than Freebase.
>
> Only 8,664 invalid facts were found in DBpedia Live,  compared to
> 247,557,030
> valid facts.  The predominant problem in DBpedia Live turned out to be
> noncomfortmant IRIs that came in from Wikipedia.  This is comparable in
> magnitude to the number of facts found invalid in the old Freebase quad
> dump
> in the process of creating :BaseKB Pro.
>
> Just one of the tools included with Infovore,  parallelSuperEyeball is an
> industrial strength RDF validator that uses streaming processing and the
> Map/Reduce
> paradigm to attain nearly perfect parallel speedup at many tasks on common
> four core computers.  Infovore 1.1 brings many improvements,  including a
> threefold
> speedup of parallelSuperEyeball and the new Infovore shell.
>
> Please take a look at our github project at
>
> https://github.com/paulhoule/infovore/wiki
>
> and feel free to fork or star it.  Note that many infovore data products
> are
> also available at
>
> http://basekb.com/
>
> Because infovore is memory efficient,  it is possible to use it to handle
> much
> large data sets than can be kept in a triple store on any given hardware.
> The
> main limitation in handling large RDF data sets is running out of disk
> space,
> which it can do quickly by avoiding random access I/O.
>
> "We challenge RDF data providers to put their data to the test",  said Paul
> Houle,  "Today it's an expectation that people and organizations publish
> only
> valid XML files,  and the publication of superParallelEyeball is a step to
> a world that speaks valid RDF and that can clean and repair invalid files."
>
> Ontology2 is a privately held company that develops web sites and data
> products
> based on Freebase,  DBpedia,  and other sources.  Contact
> [email protected] with
> questions about Ontology2 products and services.
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Precog is a next-generation analytics platform capable of advanced
> analytics on semi-structured data. The platform includes APIs for building
> apps and a phenomenal toolset for data science. Developers can use
> our toolset for easy data analysis & visualization. Get a free account!
> http://www2.precog.com/precogplatform/slashdotnewsletter
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to