"Gene" <[EMAIL PROTECTED]> writes:

> I agree with much of what you say!  I just think character sorting into
> hamming-1 order is cool, especially since it generalizes to a true sort
> with just a slight modification.

Hey, I think it's cool too :-) BTW, how exactly do you generalize it?

> There are a few other things to think about however.
>
> First, "simplest" depends on environment.  In perl you can compare two
> strings $a and $b with
>
> if (join('', sort(split '', $a)) eq join('', sort(split '', $b)) {
>   # a is a permutation of b!
> }
>
> Histogram code would be more complex and would probably run slower.

Why, can be pretty simple too.  Something like:

bool is_perm(char* a, char* b)
{
  while (*a) hist[*(a++)]++;
  while (*b) if (--hist[*(b++)] < 0) return true;
  return false;
}

> Another thing is that it's meaningless to talk about O(2N) because
> O(2N)=O(N).

Yes, O(2N)=O(N), I just wanted to emphasize the fact that you need two
linear scans, apparently not a good idea.


> Finally, for characters of w bits, the histogram requires O(2^w log N)
> space in the general case, not O(w) as you say.

My bad.  I should be paying more careful when I post.  I meant
O(2^w).  Why O(2^w logN)?

> Moreover, run time is also O(2^w) to compare those histograms.  For
> example, if you are working in Unicode where characters might have up
> to 4 bytes, you'd need 4 billion histogram entries (perhaps 16
> gigabytes), and it would take a while to compare these.

Very true.  Histogram approach would not be suitable for 4 byte
characters.  It'd be best for a relatively small number of posible
charactes.  However, please note that you don't need to compare the
histograms (although those were my words).  You just need to
accumulate the histogram of the first string, and then just decrement
the counts when scanning the second one; since you would know that the
lengths are the same, a trial to decrement a zero would mean the
strings are not permutations.  All this is O(N).  Of course this
assumes that you have initialized the histograms with zeros, which by
itself might be O(2^w), although it might be optimized by the compiler
if you use memset.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to