Re: I'll explain mine if you explain yours

Ton Hospel Sat, 26 Oct 2002 10:58:45 -0700

In private mail i've gotten a request to explain my solution and
what my "searcher" does in these postmortem solutions. I decided 
answer it here in case other people are interested.


Let's start with the basic solution as I have it on the scoreboard

The first observation is that input may looks like:

xxx billion yyy thousand zzz

so i'd like at some point to e.g. collect the yyy, then when I see
the "thousand", I want to multiply it by 1000 and add it to the
xxx billion I already have.

After some experimenting and counting lengths, i decided i want to 
keep the "working value" in one variable, so at the moment i see
the "thousand", the working variable will contain:

xxx billion yyy

and i want

xxx billion yyy thousand

which i can get by adding 999*yyy, and the yyy i can get with
accumulator%1000

Shortly after that I realised this can handle "hundred" too, even
if one hundred does not "finish" a group.

so a basic core of my solution will be:

if(thousand|million|billion|hundred) {
    $a += (number-1) * ($a%1000)
} else {
   do normal < 1000 collection
}

The do_normal will also contain a $a+= (to add the xxx to the accumulator),
so in more golfish style we get:

$a+=$a%1000*(number-1)||normal_stuff

which assumes that $a%1000 is never zero in the
thousand|million|billion|hundred case. Which is ok since the
input is things like "one thousand", not things like "thousand"
or "zero thousand". It also assumes that (number-1) *IS* zero
in the "normal number" case.

number-1 is respectively 9 nines, 6 nines, 3 nines or 2 nines. How to
generate that ? Like I said, I was originally only considering
billion, million and thousand, so I wanted to generate 3,2 and 1 times
999. And I noticed that billion and million had two l's, so I decided
to go count unique characters, and came up with

y/abl// 

to get the 3,2,1. Unfortunately "eleven" and "twelve" also contain
the letter l. These two however also contain "e" and "v", so we can
compensate for that if we subtract at least 1, so:

y/abl/-/e/

However, now I also want to handle "hundred". 99 is not a multiple of
3, so instead of 3,2,1 i want 9,6,3 for the basic three cases. so 
multiply by 3 and get ( now add the 9 multiplier too):

9x(3*y/abl/-3*/e/)

In this form it's clear that that "3" may also be a bigger number (it's ok
if eleven and twelve lead to a negative number instead of to 0).
The "hundred" contains a "e", so it will do the /e/ subtraction. That's
good, since we don't want "hundred" to lead to a multiple of 3, so we
can change the "3" to get some other values. Since we are going to subtract
at least 4, the 3*y/// must lead to at least 6, so we need characters that
are unique to "hundred" appearing at least twice in total. So that leads
us to consider "d". However, "thousand" also has a "d". Oh ! But that's just
as useable as "a",so we get:

9x(3*y/dbl/-4*/e/)

(And no other number like seventeen or twenty or whatever will make
this expression non-zero)

Next step is to handle the normal stuff (everything < 100). There is
quite a bit of regularity there. 

- if it ends on "ty" (in fact, if it contains "y" at all), it's 
  a number times 10. We can get that by e.g. putting a "0" at the 
  end of our number.
- if it ends on "teen" it's some number+10. In fact, if it contains
  "teen" at all. No ! Even "tee" is enough. And if we consider "ten"
  a special case, even "te" is good enough.
  And the number in number+10 will be a single digit in all cases.
  So we can get that by simply putting "1" in front
- All other numbers are either single digit (zero, one, etc), or we
  have (again) the exceptional cases of "eleven" and "twelve". Fortunately
  both are easy to recogize from e.g. the "l" in there.

So that leads to something like:
- If the middle term leads to 1 and 2 for "eleven" and "twelve":
  /te|l/.(handle_simple_number).0 x/y/
- If the middle term leads to 11 and 2 for "eleven" and "twelve":
  /te|lv/.(handle_simple_number).0 x/y/
(other cases exist but do not seem interesting)

So let's look at the simple numbers:
  zero
  one
  two
  three
  four
  five
  six
  seven
  eight
  nine
  (ten)
  (eleven)
  (twelve)

They are not unique in the first letter, but they ARE unique in the
first two letters. But not from the third letter on, since we want to
get e.g. 5 for both "five" and "fifty", and 2 for both "two" and
"twenty". Twelve is interesting in that if we look only at the first
two chars, it always is the same as "two", so will give us a 2 (that's
why I only considered formulas that map 12 to 2 above).

Ok, so the next step is a search for an expression that takes *exactly*
the first two chars from a string, and returns the correct number for 
them.

One obvious method is to make a little table, and just look it up.
E.g. in one of my early attempts I used
/..//2*$-[zeontwthfofisiseeiniQQel!~$&]

However, it's clear to see that the constants string is very long.
It would be nice if we could somehow combine the first two chars so
the lookup table is twice as short and doesn't need the /2.

Rick went for pack(H2,$_) as the combiner, I went for matching the
first char with /./ and then doing bitoperations on $& and the
first char of $' to create a unique character for each of the
simple numbers.

The first idea is to simply use $& & $', but unfortunately that does not
give a different char for each word. So I used some bitflipping to make
them unique. In the end I went for:

($& ^"o")& $'

Many other ways are possible, unfortunately whatever way I chose, the
lookup table alwayus contained non \w characters, so i could not drop
the quotes around the lookup string. So I just went for the lowest
tie-break. I'll use V instead of o here since it leads to more readable
chars. My initial versions were:

/./+index(')"4 &37%.Q!',$&^V&$')

These leave out the "zero" lookup to make the table one shorter. They 
also map "ten" to a non-possible char, so I can use /te|lv/ instead of
/tee|lv/. It would be nice if "eleven" automatically mapped to "1",
since then I can simply leave out the last two characters. But in fact,
we can ! The y/dbl// allows us to replace the l by something else. "l"
is the second letter of "eleven", so the combiner for "eleven" will
change, but it's only the fourth letter in "twelve", so that one will
not change. And we can choose this something else not to exist in any
simple number and such that it will combine for "eleven" to the same
character as "one" does. For better tiebreaker i also write the index
as a $+ lookup, and get:

y/dbl/\xe4/     and   /te|\xe4/././*$+['^A^S^\^I^O^Z^V^L^G'!~($&^o&$')]

The final step is to commify the resulting number, where i used the
suboptimal $_=$a;{s/\B..\d\b/,$&/&&redo

All that combined with a little tie break improvement leads to:

-lp040 
$^+=$^%1e3*(9x(3*y/dbl/\xe4/-4*/e/))||/te|\xe4/././*$+['^A^S^\^I^O^Z^V^L^G'!~($&^o&$')].$[x/y/}$_=$^;{s/\B..\d\b/,$&/&&redo

Which was my final submission.

However, i still was not very happy with how i went from "to first chars"
to corresponding number, and i had noticed that in fact "crypt" sort of
does what i want: crypt($pass, $salt) takes the first two chars from $salt,
then does some des-like encryption on $pass (modified by the salt-chars)
and then returns a string representation of this result. So the idea was 
to find some interesting $pass, such that if I somehow make a number out
of the resultstring the same mapping is done.

Making a number is not so hard with things like:

vec(crypt($pass, $_), $o, $b)
unpack(N, crypt($pass, $_))

unfortunately these numbers will be all over the map (especially in the 
unpack("N", ..) case). So to improve the chances to find a good $pass,
I decided to add a modulo so that more output numbers can give the 
result we want. While the challenge was still running I decided to
concentrate on 

vec(crypt($pass, $_), $o, 4)%$m

So I wrote a program that tries all $pass strings consisting of \w chars
and starting with a-zA-Z_ or - as $pass. It then does the crypt with
"nine" and sees on which positions ($o) the result is exactly 9, and if
there is such a case, continues with 8,7 and 6. After that point we can
make 5 either directly or with 15%10, so it also starts keeping track of
possible modulos.

Unfortunately "crypt" is pretty slow, even if you code this program in C.
So while I had the idea a few days before the challenge ended, my searcher
found nothing while it was still running. A few minutes after the end I 
found this very lucky hit:

k=  zrQ0W,o=16,b= 4,m=10:0,1,2,3,4,5,6,7,8,9,0,8

which you can check with the following perl code:

perl -wle 'print join",", map vec(crypt(zrQ0W,$_),16,4)%10, qw(zero one two three four 
five six seven eight nine ten eleven)'

And it indeed prints:
0,1,2,3,4,5,6,7,8,9,0,8

And we still have that trick to modify "eleven" in the y/dbl// up our
sleeves, so we can use that fixup that last 8 to a 1. So we get:
(adding a few Rick/Bob imporvements):

$^+=$^%1e3*(9x(3*y/dbl/Q//-4*/e/))||/te|Q//.vec(crypt(zrQ0W,$_),16,4)%10 
.e./y/}$_=$^;{s/\B(?=(...)*$)/,/g

Unfortunately this string has two big drawbacks:
 - the offset is 16, while it would be nice to need only one digit.
 - The 8 at the end causes us to need that Q as compensation. Another
   char lost

After that I got interested in how fast you can make crypt, and in
particular started looking at "john the ripper",
http://www.openwall.com/john/

Unfortunately that program does not have a direct "crypt", but generates
a kind of intermediate code. Then you also convert your target password
string to that intermediate code and then do the compare.

So I wrote some code to undo that, and for the rest stole the
implementation from that code wholesale, and put that in a program that
does a smart search over the possible $pass values (smart in the sense
that it tries to give up on a particular key as fast as possible so it
won't do useless crypts).

Once I had that, I started searching for cases that would not have a
two-digit offset or that would not need to fixup the "eleven" case.

Unfortunately none of these exist in the 5-char $pass space. And to gain
in the 6-char space, BOTH must be true. I finished searching the 6-char
space a few days ago, and no, none exist where both are true, though this
one was close:
k= _IEhBI,o=11,b= 8,m=10:0,1,2,3,4,5,6,7,8,9,0,1

(This is not yet completely certain. I have in fact not yet searched the
space consisting of pure numbers like -1.2e-4 or 123456)

So next i decided to try to get rid of the modulo at the end completely.
Since the 6-char space was already completely searched, I'm now trying for

vec(crypt($pass, $_), $o, 4)

in the 7-char space. The search-space is so big, I didn't really expect
success, but in fact after three hours I ran into:

k=DjcSj30,o=12,b= 4,m=16:0,1,2,3,4,5,6,7,8,9,0,5

Which again has both drawback (offset two digits and eleven is wrong),
but still gains because it has no modulo this time:

$^+=$^%1e3*(9x(3*y/dbl/W/-4*/e/))||/te|W/.vec(crypt(DjcSj30,$_),12,4).e./y/}$_=$^;{s/\B(?=(...)*$)/,/g

Chances are good a string exists with a one-digit offset and not needing
fixup for "eleven", but, if so, I haven't found it yet. I plan to keep
running this program in the background as my private strtol@HOME though.

If you are interested in the C-code of the current searcher, you can find
it at:

http://www.xs4all.nl/~thospel/ASIS/zlet5.tar.gz

The current version is for x86-mmx, though it's easy enough to get it to
work on other processors by compiling john the ripper and replacing arch.h
by the processor specific arch.h file john will give you.

This program might contain the fastest known existing implementation of 
full forward crypt (the pure crypt part does 530000 crypts/second on
my Athlon XP1800+). I'm very interested if anyone knows anything faster
(I'm in fact wondering about bryDES). (the openssl fcrypt does 120000/sec,
glibc ultra-fast-crypt does 40000/sec)

Compile it with some high optimisation, put a start string in a file
(e.g. I used A000000) and run it with that filename as argument
(it will update that file ever so often to mark where it will start now).
On my athlon XP1800 it works through a 5-char block in about 1 hour.
(the code I had during the challenge needed about 12 hours for that)
The code does about 440000 crypts/second, so the overhead is reasonable.
It can't be made much faster unless I improve the crypt().

Re: I'll explain mine if you explain yours

Reply via email to