Rupert,
Is the freebase index available for use? From where I can get it?
I can compare the entities I get from dbpedia and freebase on some of my
test files.

Is [2]
http://googleresearch.blogspot.com.es/2012/05/from-words-to-concepts-and-back.htmlused
by any engine now in stanbol?
Are there any programming apis avaialble to access the concept dictionary?

-harish


On Sat, Apr 27, 2013 at 7:35 AM, Rupert Westenthaler <
rupert.westentha...@gmail.com> wrote:

> Hi Antonio
>
> First of all thx for your interest!
>
> On Thu, Apr 25, 2013 at 4:04 PM, Antonio Perez <ape...@zaizi.com> wrote:
> > Hi everybody
> >
> > I'm Antonio David PƩrez, a new Zaizi team member and a student for a MSc
> at
> > the University of Seville. Lastly, I've been involved in the development
> of
> > a semantic CMS solution in a Spanish Company called Ximdex working with
> > several technologies like Apache Nutch, Apache Solr and also Apache
> Stanbol.
> >
> > Currently, I've been assigned to a project that involves different
> > technologies like Apache Stanbol and Apache ManifoldCF. So, related to
> > Stanbol, I'm interested in the disambiguation problem, so I would like to
> > prepare a proposal for GSoC about this topic.
> >
>
> If you do have already some experiences with Apache Stanbol, this
> would be fore sure a big help for a GSoC project.
>
> > I have been following last mails about disambiguation and WebID
> protocol. I
> > would be more interesting in develop disambiguation systems within
> Stanbol
> > using the major semantic knowledge bases. Actually, my initial idea is to
> > use Freebase with the aim to make it extensible to any other database
> like
> > Wikipedia and DBpedia. Following STANBOL-1037 [1], the main goal is to
> > implement a couple of global-approach disambiguation algorithms to be
> used
> > in Stanbol.
> >
>
> Disambiguation on "World Domain" datasets is a very important feature
> for a lot of usage scenarios. So definitely very interesting and
> relevant for Apache Stanbol.
>
> > For this, I would like to discuss some topics about the proposal:
> >
> > - Knowledge Base: I have decided to stick first to Freebase, because it
> has
> > a REST API allowing 100k calls per day for read and 10k for write.
> Besides
> > the REST API, an alternative could be to integrate the whole freebase
> graph
> > in Stanbol and use their Java API to manage it. Ideally, the management
> > framework should be valid for others knowledge bases as Wikipedia or
> > DBpedia.
> >
>
> I recently created my first Freebase index for Stanbol (see
> STANBOL-1014 for the Indexing tool). First test on an Index with all
> Freebase Topics and all languages have shown very nice result! IMO
> Freebase is currently for sure the better choice over DBpedia. However
> one needs to see/wait how Freebase compares to the Wikidata project
> [4] that only recently entered phase 2.
>
> Designing disambiguation in a way that it can be applied to other
> datasets would be for sure a great bonus. But given the good results
> one can get with Freebase I would even be very interested if the
> results would only work on Freebase ^^
>
> > - Resources: As have been pointed before in the mailing lists, google has
> > released a couple of resources to be used in disambiguation applications.
> > One if a dictionary of concepts from Wikipedia, using anchor text labels
> in
> > Wikipedia internal links to create an index of entities possible names
> [2].
> > The second one is a dataset of texts that links to concepts in the
> > Wikipedia [3] that can be used as disambiguation contexts according to
> > STANBOL-1037. I need to research if similar information can be retrieved
> > directly from freebase or , in other words, to check if this information
> is
> > already incorporated in Freebase.
> >
>
> I think you can even use [2] and [3] for disambiguation on top of
> Freebase as there is anyway a mapping between Freebase and DBpedia
> concepts. However you will likely need a higher quality mapping as it
> is currently available. Because of that I would suggest you to start
> of with implementing STANBOL-1046 [5]. For possible names (or surface
> forms as they are also often called) one can use the Alias in
> Freebase. However AFAIK there are no information available in Freebase
> similar to [3]. Related to this I fond however an interesting pager
> [6]. The semi-supervised approach suggested in chapter III could
> nicely work. Especially if one considers that users could manually
> disambiguate Entities. In combination with other mentions extracted by
> the Stanbol Enhancer this could be used to acquire the required data.
>
> > Moreover, the proposal design will try to be as generic as possible in
> > order to be adaptable to any other Knowledge Base.
> >
>
> Disambiguation is not something easy and making something "generic"
> makes it even harder. So IMO having one/several more specific options
> would not hurt a GSoC proposal. It would also make it easier to
> evaluate the proposal.
>
> > Waiting for your comments and valuable suggestions.
> >
>
> Hope my comments provided at least some valuable information.
>
> best
> Rupert
>
> References:
>
> > [1] https://issues.apache.org/jira/browse/STANBOL-1037
> > [2]
> >
> http://googleresearch.blogspot.com.es/2012/05/from-words-to-concepts-and-back.html
> > [3] https://code.google.com/p/wiki-links/
> [4] https://www.wikidata.org/wiki/Wikidata:Main_Page
> [5] https://issues.apache.org/jira/browse/STANBOL-1046
> [6]
> http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/38389.pdf
>
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam
> Road,
> > London W10 5JJ, UK.
>
>
>
> --
> | Rupert Westenthaler             rupert.westentha...@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
Thanks
Harish

Reply via email to