Tom Lomax created STANBOL-1144:
----------------------------------

             Summary: NamedEntityTaggingEngine produces invalid TextAnnotations 
leading to NullPointerException during enhancement
                 Key: STANBOL-1144
                 URL: https://issues.apache.org/jira/browse/STANBOL-1144
             Project: Stanbol
          Issue Type: Bug
          Components: Enhancement Engines
    Affects Versions: enhancement-engines-0.10.0
         Environment: Any
            Reporter: Tom Lomax
            Priority: Critical


Some particular pieces of text, when sent to the enhancer, cause a crash in the 
Solr Yard code. e.g. trying to enhance the following:

Syrian regime!"
(That's "Syrian regime" followed by an exclamation point and a double quote).

...results in a NullPointerException in the dbpediaLinking phase:
Caused by: java.lang.NullPointerException
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.initTextConstraint(SolrQueryFactory.java:415)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.createIndexConstraint(SolrQueryFactory.java:330)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.parseFieldQuery(SolrQueryFactory.java:235)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrYard.find(SolrYard.java:267)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrYard.findRepresentation(SolrYard.java:362)
    at 
org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:192)
    at 
org.apache.stanbol.entityhub.core.impl.ReferencedSiteImpl.findEntities(ReferencedSiteImpl.java:151)
    at 
org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine.computeEntityRecommentations(NamedEntityTaggingEngine.java:505)
    at 
org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine.computeEnhancements(NamedEntityTaggingEngine.java:370)
    at 
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:271)
    at 
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:189)
    at 
org.apache.felix.eventadmin.impl.tasks.HandlerTaskImpl.execute(HandlerTaskImpl.java:88)
    at 
org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:221)
    at 
org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:110)
    at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown 
Source)
    at java.lang.Thread.run(Thread.java:724)

I believe the cause of this is in 
enhancement-engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/NamedEntity::createFromTextAnnotation,
 called by NamedEntityTaggingEngine::computeEnhancements, where it verifies the 
supplied entity name is not null or empty, but later removes punctuation and 
calls trim() again (the cleanupKeywords method), resulting in a possibility for 
an empty entity name to be included in the TextAnnotation being generated.

Should be an easy fix - providing this is the correct place to fix it. I will 
attach a patch.

I have reproduced this on clean installs of 0.10.10, and trunk as of this 
morning (r1509579).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to