The puzzle unflipped is extra hard as the solution to the first row is 987 654 321. Come to think of it, one could add a new function "flip" that mutates the sudoku according to well known rules and maybe something like unflip for when the sudoku was finnished.
New techniques could certainly be added (like single candidate and naked pairs) without too much overhead so they might just pay off, certainly on the impossible puzzle. Maybe I could also pre-calculate all rows, collumns and squares and store them in int pointer arrays. This way things could become even faster. Would it be possible to do something like that in ctfe?