Lets see,
We had OPENNLP-297, 298 ....
I think OPENNLP-333, 417 may also apply.
We can fix but ... can't modify the index returned by Span classes. All
the other code relys on the odd state of the indexes being there.
The Span.spansToStrings() function can probably be modified to address
this ... I'll have a look and see what is wrong..
James Kosin
On 2/20/2013 8:25 PM, James Kosin wrote:
Jim,
I thought I worked on a related issue, could you check to see if it
was fixed with trunk?
Thanks,
James Kosin
On 2/20/2013 9:52 AM, Jim foo.bar wrote:
To be honest I couldn't remember if I opened a ticket for it so I had
a quick look through jira but couldn't find any related ones....I
will open the ticket this afternoon and provide a patch as well...
Jim
On 20/02/13 13:57, Jörn Kottmann wrote:
Did you open a jira for it as suggested by Lance? Do you recall the
issue number?
We should have the fix for it into the 1.5.3 release.
Jörn
On 02/20/2013 02:14 PM, Jim foo.bar wrote:
a bit of googling and I managed to locate the thread from November!
Here it is:
http://mail-archives.apache.org/mod_mbox/opennlp-users/201211.mbox/%3c509bec16.7050...@gmail.com%3E
I reported it and fixed it back then, but I can't remember whether
I communicated my fix with you guys... I'll investigate my private
fork and try to spot the differences and I'll let you know what
happens...I think it was a minor bug...there was a '-1' somewhere
if I'm not mistaken...
Jim
On 20/02/13 13:00, Jim foo.bar wrote:
Ok , sorry I rushed earlier...Now I remember what happened 8-9
months ago...It's not the Span.spansToStrings () that has the
problem but the RegexNameFinder instead! Calling the .find method
of the RegexNameFinder returns spans of the form I mentioned
earlier (#<Span [3..3)>)...I do remember fixing this but I 'm not
sure I submitted a patch...can anyone shed some light or should I
go back to diff my sources?
Jim
On 20/02/13 12:16, Jim foo.bar wrote:
I forgot to mention that I'm referring to the 1.5.2-incubating
version available on maven. Presumably this been fixed in trunk?
Jim
On 20/02/13 11:53, Jim foo.bar wrote:
Hi everyone,
I'm pretty sure we had this discussion last year and that it was
fixed! Basically, whenever any NameFinder recognises a single
word token the resulting span is something like this:
(#<Span [3..3)> #<Span [6..6)>)
while I think it should have been (#<Span [3..4)> #<Span [6..7)>).
As a result the following exception is thrown :
StringIndexOutOfBoundsException String index out of range: -1
java.lang.AbstractStringBuilder.substring
(AbstractStringBuilder.java:872)
I am 99% positive that we've fixed this in the past...at least
my private openNLP build behaves as expected. Just in case I'm
doing something wrong here are my steps:
- create a RegexNameFinder passing the following regexes in an
array: "\d+", "\w+ive?"
-call find on it passing the following text in an array
["azestapine" "treatment" "is" "10" "times" "more" "effective" "."]
-I get back the aformentioned spans (#<Span [3..3)> #<Span
[6..6)>)
-trying to convert them to string-array (via
Span/spansToStrings) doesn't work!
any ideas? This is quite important isn't it?
Jim