On Sat, Aug 07, 2010 at 07:25:20PM +0100, Luke Pebody wrote:
> If it does work, step 3 can be simplified. By the time you reach step
> 3, there are no substrings of the form XX or XYX. Thus it must be of
> the form XYZXYZXYZ...

Hey, thanks!
I've been worrying about the "If it does work" part, too, and spotting
this pattern made it much easier to construct a proof. It's a bit
verbose, but here it is.

First, I'll import rules 1 through 4 from the contest analysis,
and then prove a few more rules.

Rule 5: If the next letter to be printed is X, it is never useful
to push a letter different from X. A sequence like:
  push Y, push X, ..., pop that X
can always be rewritten to
  push X, ..., pop that X, push Y
which has the same cost and leaves the stack the same. The operations
in the elided part cannot depend on the Y because it is hidden by the X.

Rule 6: A sequence of pops is only useful if it leaves the next letter
to be printed at the top of the stack (except at the end, when emptying
the stack).
Proof: A sequence of pops can only be followed by another pop, by a
print (which satisfies rule 6), or by a push. If it's a push, then
we have a sequence like:
  pop, push X, ..., pop that X
which can be rewritten to
  push X, ..., pop that X, pop
which has the same cost and leaves the stack the same.

The correctness of my Step 1 follows directly from Rule 1. If there is
an optimal solution for the output of Step 1, then an optimal solution
for the input of Step 1 can be constructed by just adding print statements.
The solutions will have the same number of pushes.

The proof of correctness of Step 2 requires some definitions. Let's
look at a single iteration of Step 2.
 - The input string, which has at least one substring matching XYX,
   is called StringXYX
 - The output string, which has the leftmost XYX replaced by just X,
   is called StringX
 - Suppose we have an optimal sequence for printing StringX, which
   follows rules 1-6 above. Call it SequenceX.
 - From SequenceX we can construct a sequence for printing StringXYX,
   which follows rules 1-6 above and has exactly one more push than
   SequenceX. Call it SequenceXYX.
Step 2 is correct if SequenceXYX is optimal.

Now, let's look at SequenceX right after it prints the X in question.
To get a sequence for StringXYX we have to add "print Y" and "print X"
after it, together with stack manipulations to make it work.
The options for this are limited by rules 1-6: before each print
statement we can either push the required letter, or pop down to the
required letter. That leaves 4 possibilities to examine.

Option 1: push Y, print Y, pop Y (leaving X), print X
This is the straightforward option. It adds one push, as expected.
There's no way to make any further savings because the state at
the boundaries of the XYX in SequenceXYX is identical to the state
before and after the X in SequenceX, which is optimal.

Option 2: push Y, print Y, push X, print X
This would create an XYX sequence on the stack, which according to
Rule 4 is never necessary (it simplifies to Option 1), so we can
disregard this case.

Option 3: pop down to some Y, print Y, push X, print X
This works if there is an Y already on the stack. Just like Option 1
it has one more push than SequenceX. It leaves the stack a little
shorter, so other parts of SequenceXYX have to be modified to account
for this. Can those modifications reduce the number of pushes again,
so that the final cost of SequenceXYX is less than in Option 1?
No, because the sequence of pops done here can have no effect
until the X that was pushed is popped again, so an equivalent modification
can be constructed for SequenceX that does the same sequence of pops
at that point. And SequenceX is optimal, so that modification will not
save a push.

Option 4: pop down to some Y, print Y, pop down to some X, print X
This works if an Y and an X are available on the stack.
Unlike the other options it does not add a push, but it does leave
the stack shorter so other parts of SequenceXYX have to be modified
to account for this. Those modifications will result in at least one more
push elsewhere. Proof: we're inserting this sequence just after a
print X, so the stack must have the form X ... Y ... X here.
If we look back along the operations of SequenceX that led to this,
we will find a place where it pushes that second X instead of poppping
down to the X already on the stack. Since a push adds to the cost, and
SequenceX is optimal, this must mean that popping instead of pushing
there would have required at least one extra push elsewhere.
SequenceXYX does those pops in a different place but the same
logic as in Option 3 applies.

Conclusion: all options for constructing SequenceXYX out of SequenceX
involve at least one extra push, and there is an optimal solution that
adds exactly one push. This proves the correctness of Step 2.

The proof of correctness of step 3 depends on Luke Pebody's
observation that at this point the string is of the form XYZXYZXYZ...
and his calculation of (2n div 3) + 1 pushes for a string of length n.

Let's consider the string XYZXYZ. Due to rules 5 and 6 and the fact
that the stack starts empty, there must be an optimal solution that
starts with push X, print X, push Y, print Y, push Z, print Z.
Now we have XYZ left to print, and XYZ on the stack but in the wrong
order. We can get only one of XYZ by popping so we have to push two
of them, for a total of 5 pushes. The options are:
  pop, pop (leaving X), print X, push Y, print Y, push Z, print Z
  push X, print X, pop, pop (leaving Y), print Y, push Z, print Z
  push X, print X, push Y, print Y, pop, pop (leaving Z), print Z
All three leave XYZ on the stack, so if another XYZ sequence is
added then the same logic applies: it takes two more pushes and
will leave XYZ on the stack. This makes Step 3 valid for lengths
divisible by 3. (It is possible to do variations, like using three
pushes for the second XYZ and only one push for the third, but that
just moves the pushes around. Because of the order of letters on the
stack it always takes two pops to save one push, so at most a third
of the letters can be done with pops.)

Now consider strings that end on X or on XY. To match the calculation,
a string that ends on XY should take one more push than the one without
it, and a string that ends on X should take no extra pushes.
We can use an extra rule here.

Rule 7: As a consequence of Rule 6, it is never useful to pop the stack
empty except at the end, which means the letter that is pushed first
will still be on the stack when the last letter is printed.

If the string ends on X then Rule 7 implies that the final X does not
require a push. Thus, the cost of this string is the same as the cost
of the string without the final X.

If the string ends on XY then there is a solution that only uses one
extra push: use the strategy above, that leaves XYZ on the stack, and
then use pops to get X or Y and use a push for the other one. It is not
possible to do better than this: if both the X and the Y are done with
pops, then the stack must have the form X ... Y ... X here (an X on
top because we print it and then pop it, and an X at the bottom due to
Rule 7). By the same reasoning as for Option 4 in Step 2, arranging
the stack to allow these pops would mean using at least one extra
push elsewhere.

This proves the correctness of Step 3 for string lengths >= 6.
Correctness for shorter strings is easy to verify and I'll skip it here.

So, there's my proof for the linear solution. Feel free to poke holes
in it, and feel especially free to awe me by restating the whole thing
in two paragraphs :)

-- 
Richard Braakman
(competing as Amtep)

-- 
You received this message because you are subscribed to the Google Groups 
"google-codejam" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-code?hl=en.

Reply via email to