Re: entity linking vs classification

Bhoomin Pandya Wed, 24 Sep 2014 21:28:40 -0700

Hi Mataari;:

I am trying to answer from what I gather from your question and
discussion with Rupert. I agree completely with Rupert. Please,
correct me if you feel I am wrong in any ways.


Stanbol has many amazing features and offers end to end enhancement
making it complete enhancement engine with RESTful API.

Enhancer generated entities could be used in various ways. These
entities also would be very useful in Keywords and SEO level
improvements. Enhancer/Contenthub gives you an output RDF/XML &
N-Triples with enhancements for text data directly which is very
unique feature. N-Triples help you construct the data even if subject,
predicate and object are stored on different servers. This is a great
feature for linking besides SPARQL endpoints.

On Stanbol OWL files are manged by using Ontonet for later being
consumed by reasoning services, refactorers, rule engines. OWL is
knowledge representation and uses Taxonomy and Hierarchy etc for
classification/categorization (as Rupert mentioned). eg. Site Map
could be constructed using Ontologies.

SKOS is knowledge organization (front end, middle ware and back-end
level) at concept level which includes collection of concepts,
relationship between them, and it covers all controlled vocabularies.
It is in fact Semantic Intergration.

Your problems seems to be linked to some deployment process either a
site or for site SEO. To me the basics used for content enhancement is
also useful for SEO, too. You can use the schema.org vocabulary along
with the microdata, RDFa, or JSON-LD formats to add information to
your HTML content ( as it is mentioned on their site). I feel the
schema.org is more for the html content rather than producing triples
from the content which please check.

I hope this solves your query. Please, let me know your views.

Many Thanks
Bhoomin Pandya

On Thu, Sep 25, 2014 at 4:37 AM, Maatari Daniel Okouya
<okouy...@yahoo.fr> wrote:
> It all make sense now thanks.
>
> One thing that I do not really comprehend, although i may have an idea of how 
> to hack it, is how do you go from the standard description based on the 
> Standbol vocabulary, to producing let say a triple with schemer.org 
> vocabulary: ex:resource schema:about    dbPedia:Bob_Marley.
>
> I would just appreciate to understand the vision behind it. That is, how did 
> the Stanbol team envision the best practice to produce that.
>
> Is it something that the application that use Stanbol should do. Upon 
> receiving some suggestion and being ok with the Tag, the resource should be 
> marked as such. But should the new triples go back in StanBol, or another 
> store, i’m not sure.
>
> I’m not sure to properly understand how the tagging of the enhancer is used. 
> is it dedicated at developer to next build, an appropriate description based 
> on it? Because if one want to optimise his description for search engine 
> optimisation, then the Stanbol descriptions are invisible to google for 
> instance.
>
>
> Could anyone help to clarify the main idea here.
>
> Many thanks,
>
> M
>
>
> --
> Maatari Daniel Okouya
> Sent with Airmail
>
> On 23 Sep 2014 at 08:09:27, Rupert Westenthaler 
> (rupert.westentha...@gmail.com) wrote:
>
> Hi Maatari
>
> Not sure if I fully understand your questions ...
>
> ad (1), (2):
>
> * Entity Linking does use "surface forms" to detect mentions of those.
> "Surface Forms" are the strings used to refer to an Entity within a
> text. So typically the labels of an Entity are used as "Surface Forms"
> for linking.
> * Named Entity Recognition is most often done with Machine Learning.
> However also some rule based systems are in use. In case of Machine
> Learning you need to provide a training set. So if you want to detect
> Entities of a specific type you will need to provide a training set.
> Annotating ~1000 occurrences and you will start to get a useable
> model.
> * For Categorization it is the same. The classifiers used for this
> task also require training data. You will need to manually classify
> documents for your categories. In this case think about ~40 documents
> per category.
>
> ad (3): Sorry I do not understand your question. Just let me answer to
>
>> Can the enhancement indeed, categorise according to non-skos instance, that 
>> are in an external dataset?
>
> The TopicAnnotationEngine [1] in Stanbol does not require SKOS. You
> can also define concepts by names (see page 7). SKOS is supported (see
> page 8) but not required. The critical thing is not to define the
> concepts but to provide the training data ^^
>
> best
> Rupert
>
> [1] http://stanbol.apache.org/presentations/Topic-Classification.pdf
>
>
> On Mon, Sep 22, 2014 at 4:37 PM, Maatari Daniel Okouya
> <okouy...@yahoo.fr> wrote:
>> I understand better.
>>
>> I think the key sentence here was: “Important is that Entity Linking 
>> requires an actual mention of the
>> Entity in the text while categories do not depend on such mentions. "
>>
>>
>> -So basically wether the category is based on a SKOS DataSet or Not, this 
>> does not matter at all !!!
>>
>> -In both case they link to a dataset, it does not matter if it is SKOS based 
>> or not. The difference is how the entity to which we link comes up.
>>
>>
>>
>> Few questions here if you don’t mind. I’m not trying to reemployment things 
>> here, but simply to better understand things so i can use the tool properly.
>>
>>
>> 1) How would the information of a specific category set be fetch ? The 
>> process of linking in categorisation must be different, in that you do not 
>> have the type to guide you. You may well end up with synonyms, without the 
>> type erros would occurs. I can see why using a controlled vocabulary would 
>> be more easy. There, the disambiguation is within the label directly.
>> Would you confirm my assumption here ? That categorisation with a Skos based 
>> dataset (thesarus) is more easy ?
>>
>> 2) Is the reason for the Named Entity Recognition to limit itself to these 
>> three specific Type “Pertinence” ? Also would this type be customisable, 
>> meaning could you have a bit more types ?
>>
>>
>>
>> 3) What i want to achieve is describing some content resource according to 
>> schema.org. For creativeWork, it has the property “schema:about” which must 
>> point to a “schema:Thing”. I presume by that, google is expecting here, 
>> something else than a controlled Concept. I’m not saying that it is not 
>> possible. In the sameWay, with FOAF:Topic that i would also use, I want to 
>> point to the real thing rather than a control vocabulary Concept. I would 
>> rather use, dc:subject for the SKOS:Concept. Does it make sense? Can the 
>> enhancement indeed, categorise according to non-skos instance, that are in 
>> an external dataset?
>>
>>
>> Many thanks,
>>
>> Maatari
>>
>>
>>
>> --
>> Maatari Daniel Okouya
>> Sent with Airmail
>>
>> On 22 Sep 2014 at 06:49:14, Rupert Westenthaler 
>> (rupert.westentha...@gmail.com) wrote:
>>
>> Hi Maatari,
>>
>> On Mon, Sep 22, 2014 at 8:22 AM, Maatari Daniel Okouya
>> <okouy...@yahoo.fr> wrote:
>>> I’m a bit confused about few concept. Could someone clarify them a bit.
>>>
>>>
>>> When it comes to assigning some topics to a content resource, what would be 
>>> the difference between entity linking and categorization ?
>>>
>>
>> First lets explain the terminology as used by Stanbol. For that I will
>> use a todays headline:
>>
>> "Lewis Hamilton not thinking about title after winning Singapore GP"
>>
>> Named Entity Recognition: Detects mentions of Entity types within the
>> text. Typically Persons, Organizations and Locations
>> * Lewis Hamilton -> person
>> * Singapore -> location
>>
>> Entity Linking: Detects mentions of known Entities within the processed Text
>> * Lewis Hamilton -> http://en.wikipedia.org/wiki/Lewis_Hamilton
>> * Singapore Grand Prix -> http://en.wikipedia.org/wiki/Singapore_Grand_Prix
>>
>> Categorization: Assigns the content to a fixed set of categories.
>> Categories might be hierarchical. A typical example are the IPTC Media
>> Topics [1] which I will use for this example.
>> * sport -> http://cv.iptc.org/newscodes/mediatopic/15000000
>> * Formula One -> http://cv.iptc.org/newscodes/mediatopic/20000994
>>
>> Important is that Entity Linking requires an actual mention of the
>> Entity in the text while categories do not depend on such mentions.
>>
>>> What I see as of now, within some tools well established is the 
>>> classification part. Usually it makes use of a control vocabulary to 
>>> classify the content. Output = resource dc:Subject controledVocabularyTerm
>>>
>>> However, what i also see in the description of content resource online 
>>> within some authority website is to link the document to external non skos 
>>> resource via for instance the Foaf:Topic.
>>>
>>> In that second case, do we have both an entity linking and a classification 
>>> ? or is it that both are the same, it is just that the knowledge base 
>>> change, from external source to controlled vocabulary. Which would mean 
>>> that in the world of linked data, content classification / categorization 
>>> include entity linking? In that case i would say that, the same was 
>>> happening when linking to a controlled vocabulary term.
>>>
>>
>> IMO the properties used to represent analysis results do not
>> necessarily indicate if the results express linked entities or
>> categorizations. Based on the definition both dc:subject and
>> foaf:topic they should be both used for categories.
>>
>>>
>>> I'm little confused here. If someone, could clarify these notion i would 
>>> appreciate.
>>
>> hope this helps
>> best
>> Rupert
>>
>> [1] http://cv.iptc.org/newscodes/mediatopic
>>
>> --
>> | Rupert Westenthaler rupert.westentha...@gmail.com
>> | Bodenlehenstraße 11 ++43-699-11108907
>> | A-5500 Bischofshofen
>> | REDLINK.CO 
>> ..........................................................................
>> | http://redlink.co/
>
>
>
> --
> | Rupert Westenthaler rupert.westentha...@gmail.com
> | Bodenlehenstraße 11 ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO 
> ..........................................................................
> | http://redlink.co/

Re: entity linking vs classification

Reply via email to