Integrate topic categorizer to be implemented by Stanbol
--------------------------------------------------------

                 Key: NXSEM-44
                 URL: https://jira.nuxeo.com/browse/NXSEM-44
             Project: Nuxeo Semantic R&D
          Issue Type: New Feature
            Reporter: Olivier Grisel
            Assignee: Olivier Grisel


Integrate the 50000 topics categorizer prototyped for IKS early adopters 
meeting in July.

https://issues.apache.org/jira/browse/STANBOL-197

Once this is implemented on the Stanbol side, the Nuxeo connector should be 
leveraged to classify documents according to their top 3 topics (in addition to 
extracting occurrences of entities).

E.g. a document can be classified as talking about "Economy Crisis", "European 
Economy" and "French Presidential Elections" as document topics while having 
explicit mentions of "Nicolas Sarkozy", "Francois Holland" as entities of type 
Person and "Paris" and "France" as entities of type "Place".

This would probably imply to refactor the way we handle the types for entities: 
use dynamic facets and have a base type called "SemanticResource" with subtypes 
"Concept" and "Entity".

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        
_______________________________________________
ECM-tickets mailing list
[email protected]
http://lists.nuxeo.com/mailman/listinfo/ecm-tickets

Reply via email to