Re: Span/spansToStrings still broken?

James Kosin Wed, 20 Feb 2013 19:07:35 -0800

Lets see,

We had OPENNLP-297, 298 ....
I think OPENNLP-333, 417 may also apply.

We can fix but ... can't modify the index returned by Span classes. Allthe other code relys on the odd state of the indexes being there.

The Span.spansToStrings() function can probably be modified to addressthis ... I'll have a look and see what is wrong..


James Kosin



On 2/20/2013 8:25 PM, James Kosin wrote:

Jim,
I thought I worked on a related issue, could you check to see if itwas fixed with trunk?
Thanks,
James Kosin

On 2/20/2013 9:52 AM, Jim foo.bar wrote:
To be honest I couldn't remember if I opened a ticket for it so I hada quick look through jira but couldn't find any related ones....Iwill open the ticket this afternoon and provide a patch as well...
Jim


On 20/02/13 13:57, Jörn Kottmann wrote:
Did you open a jira for it as suggested by Lance? Do you recall theissue number?
We should have the fix for it into the 1.5.3 release.

Jörn

On 02/20/2013 02:14 PM, Jim foo.bar wrote:
a bit of googling and I managed to locate the thread from November!Here it is:
http://mail-archives.apache.org/mod_mbox/opennlp-users/201211.mbox/%3c509bec16.7050...@gmail.com%3E
I reported it and fixed it back then, but I can't remember whetherI communicated my fix with you guys... I'll investigate my privatefork and try to spot the differences and I'll let you know whathappens...I think it was a minor bug...there was a '-1' somewhereif I'm not mistaken...
Jim


On 20/02/13 13:00, Jim foo.bar wrote:
Ok , sorry I rushed earlier...Now I remember what happened 8-9months ago...It's not the Span.spansToStrings () that has theproblem but the RegexNameFinder instead! Calling the .find methodof the RegexNameFinder returns spans of the form I mentionedearlier (#)...I do remember fixing this but I 'm notsure I submitted a patch...can anyone shed some light or should Igo back to diff my sources?
Jim


On 20/02/13 12:16, Jim foo.bar wrote:
I forgot to mention that I'm referring to the 1.5.2-incubatingversion available on maven. Presumably this been fixed in trunk?
Jim

On 20/02/13 11:53, Jim foo.bar wrote:
Hi everyone,
I'm pretty sure we had this discussion last year and that it wasfixed! Basically, whenever any NameFinder recognises a singleword token the resulting span is something like this:
 (# #)

while I think it should have been (# #).
As a result the following exception is thrown :StringIndexOutOfBoundsException String index out of range: -1java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:872)
I am 99% positive that we've fixed this in the past...at leastmy private openNLP build behaves as expected. Just in case I'mdoing something wrong here are my steps:
- create a RegexNameFinder passing the following regexes in anarray: "\d+", "\w+ive?"-call find on it passing the following text in an array["azestapine" "treatment" "is" "10" "times" "more" "effective" "."]-I get back the aformentioned spans (# #<Span[6..6)>)-trying to convert them to string-array (viaSpan/spansToStrings) doesn't work!
any ideas? This is quite important isn't it?

Jim

Re: Span/spansToStrings still broken?

Reply via email to