Hi Antonio,

Thank you for the nice overview.

Let me mention that because of the following issue

On Mon, Jul 1, 2013 at 11:36 AM, Antonio Perez <ape...@zaizi.com> wrote:
>
> The indexing tool takes too much time in a standard computer, so in order
> to execute this process, you'll need either a computer with SSD or
>  a computer with 200GB of RAM in order to deal with the whole Freebase data
> dump in memory.
>

Rafa has started to work on an IndexingSource that can directly
operate on the Freebase dump (any single file RDF dump that is sorted
by SPO). With such a source one can index a dataset without first
importing the data to an RDF triple store. As this is the most
hardware demanding part of the chain it should greatly improve
indexing performance.

However this IndexingSource will not support LDPath and will therefore
not support some of the available EntityProcessors.

>
> For the next milestone (midterm evaluation) the following tasks need to be
> done:
> 1.  Convert wiki-links data dump to RDF
>     * Wiki-links contains a lot of disambiguation information which it is
> wanted to incorporate to the Entityhub Freebase site.
>     * The wiki-link data dump will be converted to RDF to be easier to
> process by the new Stanbol Freebase indexing tool (point 2)
>     * The wiki-link expanded dataset [1] will be used because it contains
> information like extracted context for the mentions, alignment to Freebase
> entities, etc.
> 2.  Develop a new stanbol indexer to join Freebase and wiki-links
> information

The expanded dataset [1] is really great that is allows to avoid a lot
of very time-consuming tasks (crawling the resource and extracting the
mention text and context, linking the dbpedia URIs to freebase).
Without this those information the usage of this great dataset would
not be feasible because of time constraints.

> 3.  Generate a graph with the links in Freebase
>     * To support Graph-based disambiguation algorithms in Stanbol, a graph
> will be generated using Blueprints Neo4j and every node in the graph will
> be associated to entries in the EntityHub to later be used to position
> directly in a node on the graph.
>

IMO this is really interesting not only for Disambiguation. I am
really looking forward to this. Do not forget to test the code also
with backends that are compatible with the Apache License.

best
Rupert

> Comments are more than welcome
>
> Regards
>
> [1] http://www.iesl.cs.umass.edu/data/wiki-links
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.



--
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to