[ 
https://issues.apache.org/jira/browse/STANBOL-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Lomax updated STANBOL-1144:
-------------------------------

    Description: 
Some particular pieces of text, when sent to the enhancer, cause a crash in the 
Solr Yard code. e.g. trying to enhance the following using the standard default 
chain:

{noformat}
Syrian regime!"
(That's "Syrian regime" followed by an exclamation point and a double quote).
{noformat}

...results in a NullPointerException in the dbpediaLinking phase:
Caused by: java.lang.NullPointerException
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.initTextConstraint(SolrQueryFactory.java:415)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.createIndexConstraint(SolrQueryFactory.java:330)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.parseFieldQuery(SolrQueryFactory.java:235)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrYard.find(SolrYard.java:267)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrYard.findRepresentation(SolrYard.java:362)
    at 
org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:192)
    at 
org.apache.stanbol.entityhub.core.impl.ReferencedSiteImpl.findEntities(ReferencedSiteImpl.java:151)
    at 
org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine.computeEntityRecommentations(NamedEntityTaggingEngine.java:505)
    at 
org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine.computeEnhancements(NamedEntityTaggingEngine.java:370)
    at 
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:271)
    at 
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:189)
    at 
org.apache.felix.eventadmin.impl.tasks.HandlerTaskImpl.execute(HandlerTaskImpl.java:88)
    at 
org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:221)
    at 
org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:110)
    at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown 
Source)
    at java.lang.Thread.run(Thread.java:724)

I believe the cause of this is in 
enhancement-engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/NamedEntity::createFromTextAnnotation,
 called by NamedEntityTaggingEngine::computeEnhancements, where it verifies the 
supplied entity name is not null or empty, but later removes punctuation and 
calls trim() again (the cleanupKeywords method), resulting in a possibility for 
an empty entity name to be included in the TextAnnotation being generated.

Should be an easy fix - providing this is the correct place to fix it. I will 
attach a patch.

I have reproduced this on clean installs of 0.10.10, and trunk as of this 
morning (r1509579).


  was:
Some particular pieces of text, when sent to the enhancer, cause a crash in the 
Solr Yard code. e.g. trying to enhance the following using the standard default 
chain:

Syrian regime!"
(That's "Syrian regime" followed by an exclamation point and a double quote).

...results in a NullPointerException in the dbpediaLinking phase:
Caused by: java.lang.NullPointerException
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.initTextConstraint(SolrQueryFactory.java:415)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.createIndexConstraint(SolrQueryFactory.java:330)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.parseFieldQuery(SolrQueryFactory.java:235)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrYard.find(SolrYard.java:267)
    at 
org.apache.stanbol.entityhub.yard.solr.impl.SolrYard.findRepresentation(SolrYard.java:362)
    at 
org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:192)
    at 
org.apache.stanbol.entityhub.core.impl.ReferencedSiteImpl.findEntities(ReferencedSiteImpl.java:151)
    at 
org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine.computeEntityRecommentations(NamedEntityTaggingEngine.java:505)
    at 
org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine.computeEnhancements(NamedEntityTaggingEngine.java:370)
    at 
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:271)
    at 
org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:189)
    at 
org.apache.felix.eventadmin.impl.tasks.HandlerTaskImpl.execute(HandlerTaskImpl.java:88)
    at 
org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:221)
    at 
org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:110)
    at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown 
Source)
    at java.lang.Thread.run(Thread.java:724)

I believe the cause of this is in 
enhancement-engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/NamedEntity::createFromTextAnnotation,
 called by NamedEntityTaggingEngine::computeEnhancements, where it verifies the 
supplied entity name is not null or empty, but later removes punctuation and 
calls trim() again (the cleanupKeywords method), resulting in a possibility for 
an empty entity name to be included in the TextAnnotation being generated.

Should be an easy fix - providing this is the correct place to fix it. I will 
attach a patch.

I have reproduced this on clean installs of 0.10.10, and trunk as of this 
morning (r1509579).


    
> NamedEntityTaggingEngine produces invalid TextAnnotations leading to 
> NullPointerException during enhancement
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-1144
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1144
>             Project: Stanbol
>          Issue Type: Bug
>          Components: Enhancement Engines
>    Affects Versions: enhancement-engines-0.10.0
>         Environment: Any
>            Reporter: Tom Lomax
>            Priority: Critical
>              Labels: easyfix, patch, security
>         Attachments: NamedEntityFix.diff
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Some particular pieces of text, when sent to the enhancer, cause a crash in 
> the Solr Yard code. e.g. trying to enhance the following using the standard 
> default chain:
> {noformat}
> Syrian regime!"
> (That's "Syrian regime" followed by an exclamation point and a double quote).
> {noformat}
> ...results in a NullPointerException in the dbpediaLinking phase:
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.initTextConstraint(SolrQueryFactory.java:415)
>     at 
> org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.createIndexConstraint(SolrQueryFactory.java:330)
>     at 
> org.apache.stanbol.entityhub.yard.solr.impl.SolrQueryFactory.parseFieldQuery(SolrQueryFactory.java:235)
>     at 
> org.apache.stanbol.entityhub.yard.solr.impl.SolrYard.find(SolrYard.java:267)
>     at 
> org.apache.stanbol.entityhub.yard.solr.impl.SolrYard.findRepresentation(SolrYard.java:362)
>     at 
> org.apache.stanbol.entityhub.core.site.CacheImpl.findRepresentation(CacheImpl.java:192)
>     at 
> org.apache.stanbol.entityhub.core.impl.ReferencedSiteImpl.findEntities(ReferencedSiteImpl.java:151)
>     at 
> org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine.computeEntityRecommentations(NamedEntityTaggingEngine.java:505)
>     at 
> org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine.computeEnhancements(NamedEntityTaggingEngine.java:370)
>     at 
> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.processEvent(EnhancementJobHandler.java:271)
>     at 
> org.apache.stanbol.enhancer.jobmanager.event.impl.EnhancementJobHandler.handleEvent(EnhancementJobHandler.java:189)
>     at 
> org.apache.felix.eventadmin.impl.tasks.HandlerTaskImpl.execute(HandlerTaskImpl.java:88)
>     at 
> org.apache.felix.eventadmin.impl.tasks.SyncDeliverTasks.execute(SyncDeliverTasks.java:221)
>     at 
> org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run(AsyncDeliverTasks.java:110)
>     at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown 
> Source)
>     at java.lang.Thread.run(Thread.java:724)
> I believe the cause of this is in 
> enhancement-engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/NamedEntity::createFromTextAnnotation,
>  called by NamedEntityTaggingEngine::computeEnhancements, where it verifies 
> the supplied entity name is not null or empty, but later removes punctuation 
> and calls trim() again (the cleanupKeywords method), resulting in a 
> possibility for an empty entity name to be included in the TextAnnotation 
> being generated.
> Should be an easy fix - providing this is the correct place to fix it. I will 
> attach a patch.
> I have reproduced this on clean installs of 0.10.10, and trunk as of this 
> morning (r1509579).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to