* You said that allFindings was already sorted, maybe allFindings should be a SortedSet already, or you don't need to sort outside anymore. * You don't need to pass in a comparator because Span already implements Comparable. I think it is better to use the same comparable implementation so the order is consistent with the rest of the code.
Ok, anyway, I think it is still too complex. Check how the method NameFinderME.dropOverlappingSpans is simpler. On Wed, Apr 18, 2012 at 10:47 AM, Jim - FooBar(); <[email protected]>wrote: > William, > > Were you thinking something along these lines? > ------------------------------**------------------------------** > -------------------- > private Span[] untangle2(List<Span> allFindings){ > > List<Span> problems = new ArrayList<Span>();//all the overlaps > SortedSet<Span> cleanFindings = new TreeSet<Span>(new > Comparator<Span>() { > @Override > public int compare(Span s1, Span s2) { > return (s1.getStart()-s2.getStart());**//sort in ascending order > of start offsets > } > }); > > > for (int i=1;i<allFindings.size();i++){**//start from 1 > Span current = allFindings.get(i); > Span previous = allFindings.get(i-1);//safe > > if (current.intersects(previous)|**|current.crosses(previous)){ > if (current.getType().equals(**previous.getType())){//if same type > > if ((current.length() - previous.length()) > 0) > cleanFindings.add(current); //current is longer > else > cleanFindings.add(previous); //previous is longer > > > } > else { //add both as problems > problems.add(current); > problems.add(previous); > } > } > else{ //add both as clean - each iteration will produce one > duplicate > //which will be blocked by the set > cleanFindings.add(previous); > cleanFindings.add(current); > } > } > //by this point the set is ordered and contains no nulls > //----------------------------**---------------------------- > > if(problems.isEmpty()) //if no problems so far do the usual > return cleanFindings.toArray(new Span[cleanFindings.size()]); > else > return sortProblems(cleanFindings, problems);//don't know what to do > in this method > } > ------------------------------**------------------------------** > ------------------------------**------------------------------**--- > > Jim > > > > On 18/04/12 14:19, Jim - FooBar(); wrote: > >> Actually the sortedSet would result in a lot more concise and cleaner >> code...ok I'll translate it into a version with Sets and then we put them >> side by side and decide which one is prettier (and perhaps more >> performant)... >> >> Jim >> >> On 18/04/12 14:11, Jim - FooBar(); wrote: >> >>> William, >>> >>> I did think about using a sortedSet - it is very convenient since it can >>> take a comparator as an argument to determine the ordering (exactly what we >>> need)...however i preferred to stay faithful to the general way openNLP is >>> implemented which is using lists and arrays mainly - my guess for >>> performance... >>> >>> Trust me, i hate writing code which mutates data-structures, but that is >>> the imperative way, hence the Java way...i could try a more functional >>> approach...let's see how this evolves at the moment and your suggestion >>> will be seriously considered at a later stage...do you think performance >>> will be affected at all when using sets instead of lists?I'm reading on >>> various articles that if you're sorting just once a list could be slightly >>> faster...on the other hand if one wants the items to be sorted at any given >>> time then probably a sorted set is better...that makes sense i guess. >>> >>> Jim >>> >>> >>> On 18/04/12 13:47, [email protected] wrote: >>> >>>> Jim, >>>> >>>> A comment about your code. Maybe instead of working with sort methods, >>>> lists and arrays you should work with SortedSet. This would help you to >>>> avoid duplicates and the order would be preserved after a removal. I >>>> think >>>> you could also avoid the nulls, maybe creating an output set instead of >>>> modifying the input. >>>> >>>> Note: I checked the Span class and there is an issue with it. The >>>> compareTo >>>> method is not checking the Span type, and it inconsistent with equals. I >>>> will open a Jira and fix it. >>>> >>>> On Wed, Apr 18, 2012 at 7:13 AM, Jim - FooBar();<[email protected]** >>>> >wrote: >>>> >>>> On 18/04/12 11:12, Jim - FooBar(); wrote: >>>>> >>>>> On 18/04/12 11:06, Jörn Kottmann wrote: >>>>>> >>>>>> On 04/18/2012 11:41 AM, Jim - FooBar(); wrote: >>>>>>> >>>>>>> I keep the longest span whenever i'm encountering 2 >>>>>>>> overlapping/intersecting spans of the same type...it makes sense to >>>>>>>> do that >>>>>>>> cos one of them must be wrong...what about different types though? >>>>>>>> >>>>>>>> When they are intersecting the following span could be longer, >>>>>>> thats >>>>>>> right. To keep things simple >>>>>>> I suggested to always keep the first span. >>>>>>> >>>>>>> Jörn >>>>>>> >>>>>>> well that sounds a bit arbitrary don't you think? It sounds like >>>>>> flipping >>>>>> a coin! >>>>>> also I'm not talking about span length here...if the types are >>>>>> different >>>>>> it makes no sense to compare lengths...the chances of any of them >>>>>> being the >>>>>> correct one are equal...if we always keep the earliest one it is easy >>>>>> to >>>>>> implement but we're hard-coding something with no good >>>>>> justification...see >>>>>> what i mean? >>>>>> >>>>>> Jim >>>>>> >>>>>> I wish there was a way to keep both... >>>>> >>>>> Jim >>>>> >>>>> >>>>> >>> >> >
