Lets see,

We had OPENNLP-297, 298 ....
I think OPENNLP-333, 417 may also apply.

We can fix but ... can't modify the index returned by Span classes. All the other code relys on the odd state of the indexes being there.

The Span.spansToStrings() function can probably be modified to address this ... I'll have a look and see what is wrong..

James Kosin



On 2/20/2013 8:25 PM, James Kosin wrote:
Jim,

I thought I worked on a related issue, could you check to see if it was fixed with trunk?

Thanks,
James Kosin

On 2/20/2013 9:52 AM, Jim foo.bar wrote:
To be honest I couldn't remember if I opened a ticket for it so I had a quick look through jira but couldn't find any related ones....I will open the ticket this afternoon and provide a patch as well...

Jim


On 20/02/13 13:57, Jörn Kottmann wrote:
Did you open a jira for it as suggested by Lance? Do you recall the issue number?
We should have the fix for it into the 1.5.3 release.

Jörn

On 02/20/2013 02:14 PM, Jim foo.bar wrote:
a bit of googling and I managed to locate the thread from November! Here it is:

http://mail-archives.apache.org/mod_mbox/opennlp-users/201211.mbox/%3c509bec16.7050...@gmail.com%3E

I reported it and fixed it back then, but I can't remember whether I communicated my fix with you guys... I'll investigate my private fork and try to spot the differences and I'll let you know what happens...I think it was a minor bug...there was a '-1' somewhere if I'm not mistaken...

Jim


On 20/02/13 13:00, Jim foo.bar wrote:
Ok , sorry I rushed earlier...Now I remember what happened 8-9 months ago...It's not the Span.spansToStrings () that has the problem but the RegexNameFinder instead! Calling the .find method of the RegexNameFinder returns spans of the form I mentioned earlier (#<Span [3..3)>)...I do remember fixing this but I 'm not sure I submitted a patch...can anyone shed some light or should I go back to diff my sources?

Jim


On 20/02/13 12:16, Jim foo.bar wrote:
I forgot to mention that I'm referring to the 1.5.2-incubating version available on maven. Presumably this been fixed in trunk?

Jim

On 20/02/13 11:53, Jim foo.bar wrote:
Hi everyone,

I'm pretty sure we had this discussion last year and that it was fixed! Basically, whenever any NameFinder recognises a single word token the resulting span is something like this:
 (#<Span [3..3)> #<Span [6..6)>)

while I think it should have been (#<Span [3..4)> #<Span [6..7)>).
As a result the following exception is thrown : StringIndexOutOfBoundsException String index out of range: -1 java.lang.AbstractStringBuilder.substring (AbstractStringBuilder.java:872)


I am 99% positive that we've fixed this in the past...at least my private openNLP build behaves as expected. Just in case I'm doing something wrong here are my steps:

- create a RegexNameFinder passing the following regexes in an array: "\d+", "\w+ive?" -call find on it passing the following text in an array ["azestapine" "treatment" "is" "10" "times" "more" "effective" "."] -I get back the aformentioned spans (#<Span [3..3)> #<Span [6..6)>) -trying to convert them to string-array (via Span/spansToStrings) doesn't work!


any ideas? This is quite important isn't it?

Jim









Reply via email to