Re: [Haskell-cafe] Code Review: Sudoku solver

Claus Reinke Mon, 10 Apr 2006 01:13:30 -0700

Mine's even faster now (at least on my computer, would you care to test it
on your's? If you don't want to get EnumSet, just change DiffArray to
Array, worked wonders for me), I'll dig into yours tomorrow to see what I
can get out of it to improve my algorithm.


Unforunately, no new inference rules :-(


indeed, the crucial difference between our versions was that I'd avoided

group inference; I thought it would be expensive to implement and, basedon my own limited experience, rarely used - boy, was I wrong on bothaccounts: it is easily implemented, with a little care for the inner loop, nottoo inefficient, either, and the examples I mentioned (on which my solverneeded guesswork) simply fall apart once grouping is added (it was quiteimpressive to take the output of my old propagation code and applygrouping by hand!).


now my solver takes slightly fewer guesses than your's, though your's

is still significantly faster (same factor of about 3 between them, bothhaving improved by a factor of 2 or so..).

Two things I don't like about your code:
1. no type declarations


that's what type inference is for!-) I tend to add them only as debugging
assertions, when language limitations require them, or when I'm fairly
certain that the code is stable (ie., dead;-). it is useful to think about 
types,
but engraving them into the code is overrated (any good ide should be
able to show them as if they'd been written in anyway..; lacking such ide,
:browse Main will do the trick, though ghci's current formatting is ugly!).

2. too much name shadowing, that makes following the code difficult


depends on which parts you mean - I don't like overusing primed or
numbered variables, and using the same variable name through a
sequence of steps is usually the last step before factoring that variable
out into some monad anyway (I did mention that the code was still
ad-hoc, not cleaned up).

apart from that: clever


as in: you'd prefer it if I'd express myself more clearly?-) agreed.
I've been working on cleaning up the code (also trying to remove
stupidities as they become exposed in the process..).

lists and adhoc code tend to be performance killers, I doubled the speed of
mine by de-adhoccing the code (and that although I introduced the
speed-killer DiffArray)


there's nothing wrong with lists, just as arrays are not synonymous with
good performance - it all depends on the usage patterns. there are several
interesting examples in this problem:

1 when I first tried EnumSet for the ranges, there was no substantialperformance difference to a list-based representation.

in theory, the change sounds like a clear win (constant vs linear timemembership test), but (a) the ranges are small (they start at size 9,and we do our best to shrink them rapidly, so even if they were at

   average size 6 or 7, the test would take an average of 3 or 4 steps);

(b) membership is not the only operation - for the inner loops in mycode, singleton testing and extraction of the single element from asingleton were equally important, and here the relation was reversed:lists give us those operations in constant time while they were linearin the old EnumSet.


   Bulat's tabling suggestion (which I interpreted rather more extremely

by tabling the actual calls to size and range2list:-) solves this issuebeautifully, making EnumSet starting to look competitive in spite ofthe particular usage pattern in this problem, and when the groupinginference asked for more calls to size, as well as intersection andunion, the choice flipped completely, now in favour of EnumSet

   (has agreement been reached about an updated version of EnumSet?).

   [if I might make a suggestion for EnumSet: it is not just that the
    implementation of the standard API can be more efficient for these
    small sets, but there are additional operations that ought to be
    included (unless the EnumSet.Set constructor gets exported),
    such as lifting the Ix instance from Word to Set a]

2 DiffArrays: they are neither a speed killer nor a universal cure.

   they were intended to enable functional arrays with inplace
   update for sequential access, but without the complex analyses
   needed to ensure such sequential access, and with the reverse
   diff to old versions preserving the persistence of copies.

using them, we'll get inplace update plus a long trail recordingthe diffs needed to reconstruct the old versions, and if we usethem single-threadedly, that trail will just be ignored and the

   garbage collector will neglect to copy it at some point. but if
   we combine them with backtracking, those trails are likely to

kill much of the performance gains, and it is not so much thewidth of branching as the depth of each branch: we iterate onthat array to narrow ranges and to update positions, and eachtiny update extends that trail of reverse diffs (they are not evencollapsed, so they can be longer than the array range) - sowhen we finally abandon a branch and need to return to anyof the older versions of that array, we are going through thattrail, linearly, to reconstruct that old version from the new one(doing an identity update before entering a branch should giveus two independent copies, though we'd need to make thatupdate not on the current version of the array..).

3 Arrays: again, they are no universal performance cure, andtheir advantages depend on the usage patterns.


   tabling calls to size is an ideal example in favour: a single array

construction vs frequent reads. in contrast, a sequence offrequent small updates, as in constraint propagation, seems tobe rather disadvantageous: every time we narrow a range or

   fix a position, the whole array gets copied! on the other hand,
   those arrays are small, and the cost of copying may be lost in

other costs. but if we have to go through many elements,the cost can be reduced by doing updates in bulk: a _list_ of

   updates, with only one new copy being created, incrementally.

   there is, however, one part of the problem where I'd expect
   arrays to be advantageous, and that is in generating those lists
   of updates: here we only inspect the sudoku matrix, reading
   its elements according to various more or less complex views.
   this part should be efficiently handled over an array, and as
   your code shows, doing so may even result in fairly readable
   code (although the clarity of my own code has improved as

well since the time when it used to be very concerned withconstructing those several views in rather complex ways;-).

I believe if you change the representation of puzzles from [(pos,range)]
to an Array, you'll get a significant speedup


since my solver's logic now seems to be on par, I might try that
for the generation of constraints, but I still doubt it is ideal for the
actual updating of the matrix - perhaps bulk updates might be
sufficient to remedy that.

but the important lesson is that one can leave non-algorithmic
performance concerns to the very end (where they need to be
guided by profiling, and limited to the actual "inner loops") - I
got my solver fairly fast without ever worrying about specialised
representations, or strictness, etc.; focussing on algorithmic

aspects was sufficient and necessary, however:

the naive generate-and-test is impractical, and while simple
constraint propagation will lead to drastic speedups, those are

not sufficient (they still left me in the 10000puzzles/hour stage),so more careful propagation is needed.

my first narrowing propagation sped up the difficult puzzles, butslowed down the simple ones (and there are a lot of those in thisset); I had already started on complex heuristics to apply that

propagation selectively before I recalled my own good principles
and looked at the algorithm again, as that slowdown should not

have happened in the first place.and in fact, it turned out to be related to one of those optimizationswe are all too likely to throw in automatically, without much thought:to avoid stumbling over the fixed positions as singletons rangesagain and again, I had disabled that propagation, and had insteadleft it to each of the other propagation steps to note the appearanceof new singletons, while ignoring the old ones.


so, when my new narrowing step narrowed the range on some
position to a singleton, without notifying anyone, that singleton
got duly ignored by everyone else, assuming that it had already
been taken care of. result: longer searches, in spite of more
information!-( rectifying that, and simplifying some structures,
got the time for the whole set down to the 17min I reported.

on the other hand, one cannot always ignore performance completely,relying only on the algorithm: but usage patterns and profiling of actualcode are the key here. when I finally added my group inference, it didslice down the number of guesses for the previously difficult puzzles,but the actual runtime for the whole set of puzzles increased by a fewminutes! profiling showed some rather extravagantly complex code("clever" in the worst sense!-) in the innermost loop, cured bysimplifying (and clarifying) the code in question. the next step wasmoving to EnumSet, followed by the size/foldBits issue, cured byBulat's tabling suggestion. now we're down to 8min20s or so, stillwith lists for backtracking and for the main matrix. (compared to2min40s or so for your's)

Rot! Typo in _my_ previous message, 5319 is correct.


I changed the output format to a line per puzzle (solution/number
and list of guesses - I still like that metric as each guess means that

we've run out of ideas, which suggests that we want to get the numberof guesses down to zero), using the same format for both solvers toallow for easy grep/diff.

> puzzles involving guesses: 8165
> largest number of guesses:
>     10 (#9175), 10 (#17200), 10 (#29823), 10 (#30811)


down to 5319 and 0/0/3/0 now.

We use different guessing strategies (plus, I also have group-inference).


yep, that convinced me that group inference is a must (now added).

as for guessing strategies, you're sorting the non-fixed positions by sizeof range before extracting candidates for guesses, whereas I simply takethem in the order they appear in the puzzle (which may have beensomewhat hard to see thanks to another one of those overcomplicatedpieces of code that has now disappeared..). I had thought about that, but

couldn't make up my mind about preferring any order, and while the
numbers of guesses for our two solvers vary wildly between puzzles,

the total of guesses shows no advantage of sorting (the total looksslightly smaller for my solver, it has no two-digit guesses, and fewer

puzzles with more than 6 guesses, so if anything, the evidence is against
sorting, but I'd rather focus on reducing the guesses than try to attach
any significance to these variations..).

if I eliminate the sorting from your solver, both solvers deliver the same
results, so that's nice!-) but it also means that we have room for further
improvement.

one reason why I got interested in this problem in the first place was
that I had been reading about constraint propagation, and wanted to

get some hands-on experience. in particular, there is a techniquecalled "generalized propagation":Generalised Constraint Propagation Over the CLP Scheme,by Thierry Le Provost and Mark Wallace. Journal of LogicProgramming 16, July 1993. Also [ECRC-92-1]http://eclipse.crosscoreop.com:8080/eclipse/reports/ecrc_genprop.ps.gz


if I may oversimplify things a bit:

1 split inferences into branching (search) and non-branching (propagation)

2 distribute common information from non-branching inferencesout of branches

3 to enhance applicability of step 2, add approximation of information

applied to our sudoku problem, we have deterministic propagation
and branching search, and what we do if we run out of propagation
opportunities is to select a non-fixed position, then branch on each

of the numbers in its range. adding generalised propagation (as Iunderstand it) amounts to a form of lookahead: we take each ofthose branches, but look only at the deterministic information we

can extract in each (ignoring further search). then we check whether
the branches have any of that information in common, and apply any

such common information to our matrix. rinse and repeat. an obviousway to refine this process by approximation is to build the union ofthe ranges on each position for the matrices resulting from all thebranches.

using this idea with a single level of branch lookahead, and selectinga position with the widest range for inspection, reduces the numberof puzzles requiring guesses to 373, with a maximum depth of 3guesses.


it doesn't reduce runtime much, yet (7min50s), but I haven't

even started profiling, either. I guess I have to look intorepresentations now, if I ever want my solver to catch up withyour's in terms of runtime;-)


cheers,
claus

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Code Review: Sudoku solver

Reply via email to