Package: crafty
Version: 23.4-6+b1
Severity: normal
There have been a number of improvements in crafty since 23.4:
23.5 Several pieces of code cleanup, both for readability and speed.
We are now using stdint.h, which lets us specific the exact size
of an integer value which gets rid of the int vs long issues that
exist between MSVC and the rest of the world on X86_64. We still
use things like "int" if we want to let the compiler choose the
most efficient size, but when we need a specific number of bits,
such as 64, we use a specific type (uint64_t for 64 bits). Minor
change to forward pruning that does not toss out passed pawn
moves if they are advanced enough (compared to remaining depth)
that they might be worth searching. Cleanup to passed pawn eval
code to help efficiency and readability. New options to the
display command. "display nothing" turns all output off except
for the move Crafty chooses to play. "display everything" will
produce a veritable flood of information before it makes a move.
Cleanup of personality code to fix some values that were not
included, as well as clean up the formatting for simplicity.
23.6 Minor tweak to "adaptive hash" code + a fix to the usage warning
that failed to explain how many parameters are required. New way
of timing the search, replacing the old "easy move" code that was
simply too ineffective. Now Crafty computes a "difficulty" level
after each iteration. If the best move doesn't change, this
level is reduced by 10% of the current value, until it reaches
60% where it stops dropping. If the best move changes during an
iteration, it adjusts in one of two ways. If the current
difficulty level is less than 100%, it reverts to 100% + the
number of different best moves minus 1 times 20%. If the current
difficulty level is already >= 100% it is set to 80% of the
current value + (again) the number of different best moves - 1
times 20%. Repeated changes can run this up to 180% max. As the
search progresses this difficulty level represents the percentage
of the nominal target time limit it should use. It still tries
to complete the current iteration before quitting, so this limit
is a lower bound on the time it will use. Restored an old idea
from earlier Crafty versions (and even Cray Blitz), that of
trying the killers from two plies earlier in the tree, once the
killers for the current ply have been tried. Was a +10 Elo gain,
added to about +10 for the new time control logic. Old fail-high
restriction removed. At one point in time, a fail-high on the
null-window search at the root would cause problems. Crafty was
modified so that the fail-high was ignored if the re-search
failed low. Removing this produced an 8 Elo gain.
23.7 Minor fix to time check code. Was particularly broken at very
low skill level settings, but it had a small impact everywhere.
Some simplification with the reduction code, and how moves are
counted as they are searched, which would not count any moves
that were pruned, which could delay triggering reductions at a
ply. Fixed significant killer move bug that would only surface
during a parallel search. NextMove (remaining moves phase) culls
killer moves as it selects, because they have already been tried.
Unfortunately, in a parallel search, the killers could get
modified by other threads, which could erroneously cause a move
to be excluded that had not been searched. I discovered this bug
on wac #266 where Crafty normally finds a mate at depth=11, but
a parallel search would often miss the mate completely. Minor
fix to new "difficulty" time management. It was possible for a
move to look "easy" (several iterations without finding a new
best move) but it could fail low at the beginning of the next
iteration. Difficulty would be reduced due to not changing the
best move at previous iterations, so the search could time out
quicker than expected. Simple fix was to reset difficulty to
100% on a fail low, which makes sense anyway if it was supposed
to be "easy" but the score is mysteriously dropping anyway.
23.8 MAXPLY changed to 128, which increases the max length of the PV's
that can be displayed. The hash path stuff works so well that it
often STILL has PVs with <HT> on the end, after 64 moves are
displayed. While this can still probably be reached, it should
be much less frequent. This does mean the search can go for 128
iterations, rather than the old 64 limit, of course. Bug in
Search() that failed to update the PV on an EGTB hit, which would
cause the hashed PV to contain illegal moves. When Crafty then
tried to display this PV, it would promptly crash.
24.0 Major changes to portability/platform specific code. Crafty now
supports Unix and Windows platforms. All the other spaghetti
code has been removed, with an effort to make the code readable.
Couple of bugs in EvaluateWinningChances() fixed, one in
particular that makes the krp vs kr draw detection unreachable.
strcpy() replaced with memmove() to fix problems with Apple OS X
Mavericks where strcpy() with operands that overlap now causes an
application to abort on the spot. Bug in Search() that caused an
inefficiency in the parallel search. At some point the way moves
are counted within a ply was changed, which changed the split
algorithm unintentionally. The original intent was to search
just one move and then allow splits. The bug changed this to
search TWO moves before splitting is allowed, which slowed things
down quite a bit. thread[i] is now an array of structures so
that the thread pointers are separated from each other to fit
into separate cache blocks, this avoids excessive cache traffic
since they get changed regularly and were sitting in a single
cache block with 8 cpus on 64 bit architectures. Bug in new
"difficulty" time usage code would let the program run up against
"absolute_time_limit" before stopping the search, even if we
finished an iteration using beyond the normal target. This
really burned a lot of unnecessary time. I had previously fixed
the PVS window fail high (at the root) to force such a move to be
played in the game. One unfortunate oversight however was that
when splitting at the root, N different processors could fail
high on N different moves, and they were searching those in
parallel. I've modified this so that the PVS fail high at the
root exits from Search() (after stopping other busy threads) and
then returns to Iterate() which will print the fail-high notice
and re-start the search with all threads searching just one fail
high move to get a score quicker. Fail-high/fail-low code clean-
up. In particular, the "delta" value to increment beta or
decrement alpha is now two variables, so that an early fail-low
won't increment the delta value and then the subsequent (hope-
fully) fail-high starts off with a bad increment and jumps beta
too quickly. All fail-highs at the root return to Iterate(),
even the PVS null-window fail-highs. This re-starts the search
with all processors working on that one move to get it resolved
much quicker. Other minor fixes such as when failing low on the
first move, we pop out of search and go back to Iterate() to
lower alpha and start a new search. Unfortunately, after the
first move is searched, a parallel split at the root would occur,
only to be instantly aborted so that we could back out of Search()
and then re-start. Wasted the split time unnecessarily although
it was not a huge amount of time. Another simplification to the
repetition-detection code while chasing a parallel search bug
was missing repetitions.
24.1 LMR reductions changed to allow a more dynamic reduction amount
rather than just 1 or 2. An array LMR_reductions[depth][moves]
is initialized at start-up and sets the reduction amount for a
specific depth remaining (larger reduction with more depth left)
and the total moves searched at this node (again a larger
reduction as more moves are searched.) This can easily be
changed with the new "lmr" command (help lmr for details). The
old "noise" stuff has been changed. Now, rather than noise
specifying the number of nodes that have to be searched before
any output is displayed (PV, fail high, etc) it now specifies a
time that must pass before the first output is produced. This
can still be displayed with noise=0, otherwise noise=n says no
output until n seconds have passed (n can be a fraction of a
second if wanted.) I restored the old adaptive null-move code,
but changed the reduction (used to be 2 to 3 depending on the
depth remaining) so that it is 3 + depth / 6. For each 6 plies
of remaining depth, we reduce by another ply, with no real bound
other than the max depth we can reasonably reach. The "6" in
the above can be changed with the personality command or the new
null m n command. When the changes to noise were made, Iterate()
was also changed so that if the search terminates for any reason
and no PV was displayed, whether because the noise limit was not
satisfied or this search did not produce a PV because it started
at an excessively deep search depth, Crafty now always displays
at least one PV in the log file to show how the game is going to
continue. One case where this was an issue was where Crafty did
a LONG ponder search because the opponent took an excessive
amount of time to make a move. If it finishes (say) a 30 ply
search while pondering, when the opponent moves, Crafty will move
instantly, but when it starts the next search, we start pondering
at iteration 29, which might take a long time to finish. But we
do still have the PV from the last search (the 30 ply one) and
main() thoughtfully removed the first two moves (the move played
and the move we are now pondering, so we always have something to
display, and now we no longer terminate a search with no output
(PV, time, etc) at all. Generation II of the parallel split code
(see Thread() in the thread.c source file comments for specifics)
that is a lighter-weight split/copy plus the individual threads
do most of the copying rather than the thread that initiates the
split. More work on the skill command. For each 10 units the
skill level is reduced, the search speed is reduced by 50%.
Additionally the null-move and LMR reduction values are also
ramped down so that by the time skill drops below 10, there are
no reductions left.
24.2 Bug in SetTime() fixed. The absolute time limit was set way too
large in sudden death games (games with no secondary time control
to add time on the clock) with an increment. It was intended to
allow the search to use up to 5x the normal target time, unless
that was larger than 1/2 of the total time remaining, but some-
where along the way that 5x was removed and it always set the
absolute time limit to 1/2 the remaining clock, which would burn
clock time way too quickly. Complete re-factor of Search() to
take the main loop and move that to a new procedure
SearchMoveList() to eliminate the duplicated search code between
Search() and SearchParallel() which has been replaced by
SearchMoveList(). New addition to NextMove(). It was cleaned up
to eliminate the old NextEvasion() code since one was a sub-set
of the other. A minor change to limit the NPS impact on history
ordering. We do not do history ordering within 6 plies of a
terminal node, and we only try to order 1/2 of the moves before
giving up and assuming this is an ALL node where ordering does
not matter.
25.0 Complete re-factor of pawn evaluation code. There were too many
overlapping terms that made tuning difficult. Now a pawn is
classified as one specific class, there is no overlap between
classes, which simplifies the code significantly. The code is
now easier to understand and modify. In addition, the passed
pawn evaluation was rewritten and consolidates all the passed
pawn evaluation in one place. The evaluation used to add a bonus
for rooks behind passed pawns in rook scoring, blockading some-
where else, etc. All of this was moved to the passed pawn code
to make it easier to understand and modify. A limited version of
late move pruning (LMP) is used in the last two plies. Once a
set number of moves have been searched with no fail high, non-
interesting moves are simply skipped in a way similar to futility
pruning. This version also contains a major rewrite of the
parallel search code, now referred to as Generation II. It has a
more lightweight split algorithm, that costs the parent MUCH less
effort to split the search. The key is that now the individual
"helper" threads do all the work, allocating a split block,
copying the data from the parent, etc., rather than the parent
doing it all. Gen II based on the DTS "late-join" idea so that a
thread will try to join existing split points before it goes to
the idle wait loop waiting for some other thread to split with
it. In fact, this is now the basis for the new split method
where the parent simply creates a split point by allocating a
split block for itself, and then continuing the search, after the
parent split block is marked as "joinable". Now any idle threads
just "jump in" without the parent doing anything else, which
means that currently idle threads will "join" this split block if
they are in the "wait-for-work" spin loop, and threads that be-
come idle also join in exactly the same way. This is MUCH more
efficient, and also significantly reduces the number of split
blocks needed during the search. We also now pre-split the
search when we are reasonably close to the root of the tree,
which is called a "gratuitous split. This leaves joinable split
points laying around so that whenever a thread becomes idle, it
can join in at these pre-existing split points immediately. We
now use a much more conservative approach when dealing with fail
highs at the root. Since LMR and such have introduced greater
levels of search instability, we no longer trust a fail-high at
the root if it fails low on the re-search. We maintain the last
score returned for every root move, along with the PV. Either an
exact score or the bound score that was returned. At the end of
the iteration, we sort the root move list using the backed-up
score as the sort key, and we play the move with the best score.
This solves a particularly ugly issue where we get a score for
the first move, then another move fails high, but then fails low
and the re-search produces a score that is actually WORSE than
the original best move. We still see that, but we always play
the best move now. One other efficiency trick is that when the
above happens, the search would tend to be less efficient since
the best score for that fail-high/fail-low move is actually worse
than the best move/score found so far. If this happens, the
score is restored to the original best move score (in Search())
so that we continue searching with a good lower bound, not having
to deal with moves that would fail high with this worse value,
but not with the original best move's value. Minor change to
history counters that now rely on a "saturating counter" idea. I
wanted to avoid the aging idea, and it seemed to not be so clear
that preferring history moves by the depth at which they were
good was the way to go. I returned to a history counter idea I
tested around 2005 but discarded, namely using a saturating
counter. The idea is that a center value (at present 1024)
represents a zero score. Decrementing it makes it worse,
incrementing it makes it better. But to make it saturate at
either end, I only reduce the counter by a fraction of its
distance from the saturation point so that once it gets to either
extreme value, it will not be modified further avoiding wrap-
around. This basic idea was originally reported by Mark Winands
in 2005. It seems to provide better results (slightly) on very
deep searches. One impetus for this was an intent to fold this
into a move so that I could sort the moves rather than doing the
selection approach I currently use. However, this had a bad
effect on testing, since history information is dynamic and is
constantly changing, between two moves at the same ply in fact.
The sort fixed the history counters to the value at the start of
that ply. This was discarded after testing, but the history
counter based on the saturating counter idea seemed to be OK and
was left in even though it produced minimal Elo gain during
testing. Change to the way moves are counted, to add a little
more consistency to LMR. Now Next*() returns an order number
that starts with 1 and monotonically increases, this order number
is used for LMR and such decisions that vary things based on how
far down the move list something occurs. Root move counting was
particularly problematic with parallel searching, now things are
at least "more consistent". The only negative impact is that now
the move counter gets incremented even for illegal moves, but
testing showed this was a no-change change with one thread, and
the consistency with multiple threads made it useful. New method
to automatically tune SMP parameters. The command is autotune
and "help autotune" will explain how to run it. Added the
"counter-move" heuristic for move ordering (Jos Uiterwijk, JICCA)
which simply remembers a fail high move and the move at the
previous ply. If the hash, captures or killer moves don't result
in a fail high, this move is tried next. No significant cost,
seems to reduce tree size noticeably. Added a follow-up idea
based on the same idea, except we pair a move that fails high
with the move two plies back, introducing a sort of "connection"
between them. This is a sort of "plan" idea where the first move
of the pair enables the second move to fail high. The benefit is
that this introduces yet another pair of good moves that get
ordered before history moves, and is therefore not subject to
reduction. I have been unable to come up with a reference for
this idea, but I believe I first saw it somewhere around the time
Fruit showed up, I am thinking perhaps in the JICCA/JICGA. Any
reference would be appreciated. Minor change to the way the PV
and fail-hi/fail-low moves are displayed when pondering. Crafty
now adds the ponder move to the front of the PV enclosed in ( )
so that it is always visible in console mode. The depths are
reaching such extreme numbers the ponder move scrolls off the top
of the screen when running in console mode or when "tail -f" is
used to watch the log file while a game is in progress. This is
a bit trickier than you might think since Crafty displays the
game move numbers in the PV. Penalty for pawns on same color as
bishop now only applies when there is one bishop.
25.0.1 Cleanup of NextMove() plus a minor ordering bug fix that would
skip counter moves at ply = 2. Added NUMA code to force the hash
tables to be spread across the numa nodes as equally as possible
rather than all of the data sitting on just onenode. This makes
one specific user policy important. BEFORE you set the hash size
for any of the four hash tables, you should ALWAYS set the max
threads limit first, so that the NUMA trick works correctly. Of
course, if you do not use -DAFFINITY this is all irrelevant. The
-DNUMA option has been removed. I no longer use any libnuma
routines. A new "smpnuma" command is now used to enable/disable
NUMA mode (which really only affects how the hash tables are
cleared, all the other NUMA code works just fine no matter
whether this is enabled or disabled. Fixed a bug with the xboard
memory command that could overflow and cause preposterous malloc
requests.
-- System Information:
Debian Release: stretch/sid
APT prefers unstable
APT policy: (990, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 4.4.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: systemd (via /run/systemd/system)
Versions of packages crafty depends on:
ii libc6 2.22-5
ii libnuma1 2.0.11-1
ii libstdc++6 5.3.1-13
Versions of packages crafty recommends:
pn crafty-books-medtosmall | crafty-books-small <none>
ii xboard 4.8.0-2
crafty suggests no packages.
-- no debconf information