Terry wrote:
> Gene wrote:
> > I'm sorry, but the response below is wrong on several levels. See
> > notes.
> >
> > C Erler wrote:
> > > Because there are infinite integers, there is no need for a list of
> > > 4,300,000,000 of them to contain a duplicate. If there are
> > > 4,299,999,999 choices, there will be one duplicate. Because the
> > > solution uses binary search, the list is sorted.
> >
> > The problem is talking about 32-bit integers. There are 4,294,967,296
> > of these, so 4,300,000,000 must contain a duplicate.
> >
> > Binary search for a _solution_ only means you repeatedly halve the
> > candidate set. Data need to be sorted when the problem is to find a
> > particular element. Here you are solving a different problem entirely.
> > Indeed if the ints are sorted, the algorithm is trivial: make one pass
> > through them and look for a duplicate adjacent pair.
> >
> > >
> > > So, the full problem is probably something like the following. There
> > > is a sorted list of the first 4,299,999,999 positive integers. Someone
> > > duplicates one of them and inserts it into the list right next to the
> > > original. Figure out which integer is duplicated.
> > >
> > > With that, it is easy to solve. If you look at the first 5000
> > > integers, the last one should be 5000. If the last one is 4999, you
> > > know that the duplicate is in that sublist.
> > >
> > > So, basically, you take the first half or so of the list and see if the
> > > last one is too low. That will tell you which half of the list has the
> > > duplicate. Repeat until your sublist has only the duplicated entry.
> >
> > As stated above, the data aren't sorted.
> >
> > The algorithm that Bentley is talking about works by repeatedly halving
> > the candidate range in which the duplicate element must lie. Initially
> > this range is 0..2^32-1. On each pass it throws away the half of the
> > range that contains fewer data values and it also throws away the data
> > lying in that half. Eventually the range decreases to a single value,
> > which must be a duplicate because the remaining data has at least two
> > elements. The problem that Bentley notes is that the data may still
> > have 4 billion elements at this stage! The final improvement he
> > mentions is that you can throw away data _inside_ the candidate range
> > as long as you keep enough data around to ensure that at least one
> > duplicate occurs in it. This is equal to the size of the current
> > candidate range plus 1!
>
> Hi Gene,
> I don't clearly understand what you are saying .
> What if we had a list of 8 elements like
> 1 3 2 4 2 5 6 7
> How are we going to find the element.
Hi Terry. Your example doesn't really apply to this problem. If you
assume integers have 3 bits (rather than 32), then Bentley's algorithm
will work only if you have a list longer than 2^3=8 elements. So let's
fix up the example by adding a zero at the end of your list.
So here is pseudocode for Bentley's proposed algorithm (without the
final improvement):
lo = 0; // bottom of range
m = 8; // number of integers currently in range
tape = [1,3,2,4,2,5,6,7,0]; // list to find dup in
while (m > 1) {
s = lo + m/2; // base of hi half of range
tape_lo = []; // set work tapes to empty
tape_hi = [];
foreach i from tape
if (i < s) add i to tape_lo else add i to tape_hi
if (tape_lo has more integers than tape_hi
use tape_lo for tape in next pass
else {
use tape_hi for tape in next pass
lo = s;
}
m = m/2;
}
// answer is in lo
On your data, you get
lo=0 m=8 s=4 tape=(1 3 2 4 2 5 6 7 0)
lo=0 m=4 s=2 tape=(0 2 2 3 1)
lo=2 m=2 s=3 tape=(3 2 2)
answer=2
Now the need for improvement shows up if you use a list
[7,7,7,0,7,7,7,7]
as input.
lo=0 m=8 s=4 tape=(7 7 7 0 7 7 7 7 7)
lo=4 m=4 s=6 tape=(7 7 7 7 7 7 7 7)
lo=6 m=2 s=7 tape=(7 7 7 7 7 7 7 7)
7
You can see that all the 7's get copied over and over, so the run time
per pass is not decreasing. The improvement is to note that once you
have m+1 characters on tape_lo or tape_hi you can stop writing values
and the algorithm still works fine.
In case you want to try other examples, here is a perl code that
implements the algorithm for 3 bits. It includes the improvement to
limit tape length.
sub main {
my $lo = 0;
my $m = 8;
my $tape = [1,6,7,0,2,4,5,3,1];
while ($m > 1) {
my $s = $lo + $m/2;
print "lo=$lo m=$m s=$s tape=(@$tape)\n";
my $tape_lo = [];
my $tape_hi = [];
foreach my $i (@$tape) {
if ($i < $s) {
push @$tape_lo, $i unless scalar @$tape_lo > $m
}
else {
push @$tape_hi, $i unless scalar @$tape_hi > $m
}
}
if (scalar @$tape_lo > scalar @$tape_hi) {
$tape = $tape_lo;
}
else {
$tape = $tape_hi;
$lo = $s;
}
$m /= 2;
}
print "$lo\n";
}
main;
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Algorithm Geeks" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---