[
https://issues.apache.org/jira/browse/OPENNLP-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Lewis updated OPENNLP-738:
--------------------------------
Attachment: AbstractDataIndexer.java-NPE.patch
> AbstractDataIndexer#sortAndMerge sets up callers for a NullPointerException
> ---------------------------------------------------------------------------
>
> Key: OPENNLP-738
> URL: https://issues.apache.org/jira/browse/OPENNLP-738
> Project: OpenNLP
> Issue Type: Bug
> Reporter: Chris Lewis
> Attachments: AbstractDataIndexer.java-NPE.patch
>
>
> In its constructor, the {{OnePassDataIndexer}} calls {{sortAndMerge}} of its
> parent class, {{AbstractDataIndexer}} (source file
> {{opennlp-tools/src/main/java/opennlp/tools/ml/model/AbstractDataIndexer.java}}).
> A quick read through the source of these two classes shows that the member
> variable {{contexts}} is only initialized by this method, otherwise it
> remains {{null}}. Note that in the case of {{sort}} being {{true}} (which it
> is as called) and there being fewer than two events, the method returns early
> thus leaving {{contexts}} unilitialized. Note also that {{getContexts}}
> exposes this variable, and that {{GIS.trainModel}} delegates to the
> {{trainModel}} method of {{GISTrainer}}. Line 263 attempts to dereference
> {{contexts.length}}, which will be {{null}} in the case of fewer than two
> events in the stream, and thus result in a {{NullPointerException}}.
> I'm not an expert in the algorithms relying on this code, but
> [some|http://comments.gmane.org/gmane.comp.apache.opennlp.user/564]
> [googling|http://blog.gmane.org/gmane.comp.apache.opennlp.user/month=20140501]
> shows a few incidents that lead back to this behavior, including at least
> the tickets OPENNLP-316 and OPENNLP-488. It may be the case that all uses of
> this code cannot possibly function correctly without >= 2 events, but I don't
> know that. As such, being the non-expert on the natural constraints of the
> inputs to {{sortAndMerge}}, I'd like to suggest 2 possible improvements: 1)
> default the {{contexts}} and other private arrays that are set in the >= 2
> path of this code to non-null defaults or 2) throw an explicit
> {{IllegalArgumentException}} that states >= 2 events are required for the
> calculation.
> The latter is not as desirable as the former (for which I've attached a
> patch), but at least it provides a targeted, unambiguous reason for why an
> exception is being thrown.
> Also I apologize for not specifying the version or component, as I'm not
> clear on how the project source is organized with respect to the published
> artifacts. This issue is present in trunk whose parent pom claims a version
> of {{1.6.1-SNAPSHOT}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)