Hi Antonio, Thank you for the nice overview.
Let me mention that because of the following issue On Mon, Jul 1, 2013 at 11:36 AM, Antonio Perez <ape...@zaizi.com> wrote: > > The indexing tool takes too much time in a standard computer, so in order > to execute this process, you'll need either a computer with SSD or > a computer with 200GB of RAM in order to deal with the whole Freebase data > dump in memory. > Rafa has started to work on an IndexingSource that can directly operate on the Freebase dump (any single file RDF dump that is sorted by SPO). With such a source one can index a dataset without first importing the data to an RDF triple store. As this is the most hardware demanding part of the chain it should greatly improve indexing performance. However this IndexingSource will not support LDPath and will therefore not support some of the available EntityProcessors. > > For the next milestone (midterm evaluation) the following tasks need to be > done: > 1. Convert wiki-links data dump to RDF > * Wiki-links contains a lot of disambiguation information which it is > wanted to incorporate to the Entityhub Freebase site. > * The wiki-link data dump will be converted to RDF to be easier to > process by the new Stanbol Freebase indexing tool (point 2) > * The wiki-link expanded dataset [1] will be used because it contains > information like extracted context for the mentions, alignment to Freebase > entities, etc. > 2. Develop a new stanbol indexer to join Freebase and wiki-links > information The expanded dataset [1] is really great that is allows to avoid a lot of very time-consuming tasks (crawling the resource and extracting the mention text and context, linking the dbpedia URIs to freebase). Without this those information the usage of this great dataset would not be feasible because of time constraints. > 3. Generate a graph with the links in Freebase > * To support Graph-based disambiguation algorithms in Stanbol, a graph > will be generated using Blueprints Neo4j and every node in the graph will > be associated to entries in the EntityHub to later be used to position > directly in a node on the graph. > IMO this is really interesting not only for Disambiguation. I am really looking forward to this. Do not forget to test the code also with backends that are compatible with the Apache License. best Rupert > Comments are more than welcome > > Regards > > [1] http://www.iesl.cs.umass.edu/data/wiki-links > > -- > > ------------------------------ > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > Statements of intent shall only become binding when confirmed in hard copy > by an authorised signatory. > > Zaizi Ltd is registered in England and Wales with the registration number > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > London W6 7AN. -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen