[ 
https://issues.apache.org/jira/browse/OPENNLP-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lewis updated OPENNLP-738:
--------------------------------
    Attachment: AbstractDataIndexer.java-NPE.patch

> AbstractDataIndexer#sortAndMerge sets up callers for a NullPointerException
> ---------------------------------------------------------------------------
>
>                 Key: OPENNLP-738
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-738
>             Project: OpenNLP
>          Issue Type: Bug
>            Reporter: Chris Lewis
>         Attachments: AbstractDataIndexer.java-NPE.patch
>
>
> In its constructor, the {{OnePassDataIndexer}} calls {{sortAndMerge}} of its 
> parent class, {{AbstractDataIndexer}} (source file 
> {{opennlp-tools/src/main/java/opennlp/tools/ml/model/AbstractDataIndexer.java}}).
>  A quick read through the source of these two classes shows that the member 
> variable {{contexts}} is only initialized by this method, otherwise it 
> remains {{null}}. Note that in the case of {{sort}} being {{true}} (which it 
> is as called) and there being fewer than two events, the method returns early 
> thus leaving {{contexts}} unilitialized. Note also that {{getContexts}} 
> exposes this variable, and that {{GIS.trainModel}} delegates to the 
> {{trainModel}} method of {{GISTrainer}}. Line 263 attempts to dereference 
> {{contexts.length}}, which will be {{null}} in the case of fewer than two 
> events in the stream, and thus result in a {{NullPointerException}}.
> I'm not an expert in the algorithms relying on this code, but 
> [some|http://comments.gmane.org/gmane.comp.apache.opennlp.user/564] 
> [googling|http://blog.gmane.org/gmane.comp.apache.opennlp.user/month=20140501]
>  shows a few incidents that lead back to this behavior, including at least 
> the tickets OPENNLP-316 and OPENNLP-488. It may be the case that all uses of 
> this code cannot possibly function correctly without >= 2 events, but I don't 
> know that. As such, being the non-expert on the natural constraints of the 
> inputs to {{sortAndMerge}}, I'd like to suggest 2 possible improvements: 1) 
> default the {{contexts}} and other private arrays that are set in the >= 2 
> path of this code to non-null defaults or 2) throw an explicit 
> {{IllegalArgumentException}} that states >= 2 events are required for the 
> calculation.
> The latter is not as desirable as the former (for which I've attached a 
> patch), but at least it provides a targeted, unambiguous reason for why an 
> exception is being thrown.
> Also I apologize for not specifying the version or component, as I'm not 
> clear on how the project source is organized with respect to the published 
> artifacts. This issue is present in trunk whose parent pom claims a version 
> of {{1.6.1-SNAPSHOT}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to