--- "Jackson, Harry" <[EMAIL PROTECTED]> wrote: > > I have been looking back at this problem and here is what I have found. > > Lets take the following set of times > A , B > 1 (4 , 5) > 2 (9 , 10) > 3 (11 , 12) > 4 (12 , 14) > 5 (12 , 18) > 6 (14 , 15) > > If we sort by column A the set inherits the following property or Rule > > a1 <= a2 is always true > > This means that we are only interested b1 properties in relation to a2 and > b2. If (b1 >= b2) then because of Rule 1 we can ignore the values in the > second set and remove them.
This seems wrong, why (b1 >= b2)? That's an assertion we aren't looking backwards in time (lets assume that the input has been checked for this - it makes the calculations easier (I could solve it with a double pass)). If I assume you meant (a1 >= b2), then it is not valid: > A , B > 1 (4 , 5) > 2 (9 , 10) Which is already sorted on A, now we can see that if a1>=b2 should never be true. So, lets assume (a2 >= b1): > A , B > 1 (4 , 5) > 2 (9 , 10) Now, if 5 > 9, then we could merge these (in this case they aren't). With the new set being (4, 10) [if 5 > 9!] > If this is not the case then if (b1 < a2) we can safely deduce that it has > no overlap in any other set from rule 1 and calculate the time and store it. I think this might be true, proving you have collapsed the above. It warrents someone better than I at Algorithms to look at. > If however (b1 < a2) is false then because of Rule 1 we can safely replace > b1 with b2 and remove the second set. I'm lost. :P > The code I have used to get the results is at the end. I have no idea if it > is a fast solution. I am not sure if it is the best way to code it either > and I am sure some people can optimise it for best performance. I wanted to > use a for loop but went for the easier while loop instead. If you only have to sort once, and you can step through in a linear fashion then it will be pretty fast. It is not dissimilar to what I did (and nobody has complained yet), it does however have an unfortunate side effect: the original sample of data gets destroyed but that is easily fixable. The time complexity of our solutions is going to be the same [I think], however improvements can be made to make it even more efficient - if we preprocess the input data, which is not unlikely. Imagine a calander with associated timeslices, where you can do some preprocessing and even store to disk. I was going to try using an AVL tree, but someone hasn't written Tree::AVL yet! I don't have time just now, so it's going to have to be on the back burner for the moment. > Some other considerations that may have to be considered if it is going to > become a module. Some of which I've covered. > For thousands of records if we find the greatest stop time then if our > searching set every has this value as its stop time we can calculate the > total and exit without iterating over any more sets providing the set has > been sorted on column A. Huh? > When we have sets with matching start times we are only interested in the > set with the largest difference and vice versa for the stop times. Time complexity to detect that case is poor, is it not? Jonathan Paton ===== s''-//--/\\///|-\/\|--\--/-\-/\-//\-|/\\\|/\///|-\--\\\\', s''/-\\\/|///|-|/|/--\--/--//\|\/\||/|/-/\\\-/\///|-\-\-', y'|\/-'3210',$_=join qq\\,map{s|2|10|||s|3|11|||s|^|0|;$_} m|.|g;map{print chr unpack'N',pack'B32','0'x24 .$_}/.{8}/g __________________________________________________ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]