Re: [fricas-devel] XHashTable performance

Waldek Hebisch Thu, 19 Dec 2024 08:35:33 -0800

On Thu, Dec 19, 2024 at 03:57:41PM +0100, 'Ralf Hemmecke' via FriCAS - computer 
algebra system wrote:
> On 12/19/24 14:33, Waldek Hebisch wrote:
> > You can do what 'value' in 'HashState' is doing, that is drop
> > sign bit using bitwise and.  Bitwise and is much cheaper than
> > 'positiveRemainder'.
> 
> See attached proposal.


Looks OK.
 
> The only problem I have with
> 
>   h1: Z := ((HASHSTATEMAKEFIXNUM(h k)$Lisp) pretend I)::Z
> 
> is that it introduces another place in the algebra code that uses a call to
> Lisp. If that would be what you suggest, then I'd rather prefer to require
> the user to provide a non-negative hashing function. The latter then would
> not add a needless chopping of bits for the default hash function value
> coming from HashState.

I was thinking rather about

   h(k) And max()$Singleinteger

which is the same as current HASHSTATEMAKEFIXNUM but written in Spad
(AFAICS HASHSTATEMAKEFIXNUM in Lisp makes some sense, as it must
be matched with other Lisp code).  But also see below.

> "rem" is, in fact, not properly specified. What tells me that the result of
> 5 rem 4 is 1 and not -3?
> The docstring of "rem" for Integer is
> 
> rem: (%, %) -> %
>     x rem y is the same as divide(x, y).remainder. See divide.
> 
> divide: (%, %) -> Record(quotient: %, remainder: %)
>     divide(x, y) divides x by y producing a record containing a quotient
>     and remainder, where the remainder is smaller (see sizeLess?) than
>     the divisor y.
> 
> And we have ...
> 
> %%% (9) -> for i in [5,-5] repeat for j in [4, -4] repeat print(i rem j)
>    1
>    1
>    - 1
>    - 1
> 
> And sizeLess?(+/-1, +/-4) is true for all 4 combinations of the sign. So it
> is, in fact, unclear wheter rem gives a positive or negative value.

'rem' for positive arguments gives nonnegative value, for other we
get what Lisp or maybe hardware gives us.

> So that speaks against the usage of 'rem' (even though, of course, our
> implementation always returns a positive value for positive input).
> I'd like to make the specification of 'rem' more precise in the same patch.

If you mean changing docstring, you probably guessed that I have
doubts about this.  Our docstrings are _not_ specification.  I
normally try to give enough information in docstrings so that
user with general knowledge can "identify" given operation.  That
is when thare are several variants/conventions in use I am trying to
specify which is used in FriCAS.  But only hostile/insane
implementation would give you negative remaider from positive
integer arguments.

> BTW, can you say whet
> 
>    convert(x : %) : Integer == x pretend Integer
>    qconvert(x : Integer) : % == x pretend %
> 
> are actually doing internally?

Nothing.  Or if you prefer you can say that they make a hole
in type system.

> Integer and SingleInteger surely have not the
> same representation.

>From FriCAS point of view representations are the same.  They
differ because SingleInteger has limited range.  At Lisp
level, knowing that range is limited allows different representation.
Lisp calls integers in implementation-defined range "fixnums".
Lisp integer operations need to check representation, and act
depending on it.  This adds extra code which normally is considerd
too large to add inline, so they are usually done by function calls.
Knowing that integers are in range of SingleInteger (that is Lisp
"fixnum") allows Lisp compiler to generate reasonably efficient
machine code.

Concerning representation at Lisp level, there are various
possibilities.  Some Lisp implementations (for examples old
versions of GCL) used essentially the same representation
for all integers.  Normally general representation requires
dynamic allocation of memory, Lisp-s using "the same
representation" usually had table of preallocated "small"
integers and instead of allocating new integer if possible
they returned result from the table.  In Lisp arrays indices
are "fixnums", which means that "fixnums" need reasonable
range, much more than possible with preallocated table of
"small" integers.  So implementations like this must dynamically
allocate some fixnums.  Conseqently, arithmetic on fixnums tends
to be quite slow with such choice.  More typical method uses
"immediate" integers.  Namely, normal Lisp data is accessed via
pointers (machine addresses).  But not all pointers are valid and
otherwise invalid pointers can be used for different purposes.
For example, in sbcl low 2 (for 32-bit machine) or 3 (for 64-bit
machine) bits of address of Lisp data are always 0.  IIRC, to
simplify arithmetic sbcl reserves numbers with lowest bit 0 as
"fixnum"-s.  Other Lisp data is represented by address with
artitmetically added tag.  This makes "fixnum" arithmetic
faster at cost of slowing down memory acceses (which must
subtract tag before actual access).  Recent GCL uses a
different scheme: there is a magic size.  Anyting smaller
than this magic size is treated as address.  Otherwise
magic correction is subtracted from the "address" and result of
subtraction is treated as an integer.

In short, Lisp dynamic typing allows users to treat integers
as a single set.  For limited range (called "fixnum"-s) Lisp
may have tricks which make operations much faster than general
case.  Users can tell Lisp than numbers are "fixnum"-s
(in our case this is done by Spad compiler).  Lying to Lisp
can lead to runtime errors, wrong results or crashes.

Due to your question about HASHSTATEMAKEFIXNUM and representation
I realised that there is potential trouble with GCL: in GCL
max()$SingleInteger is not a power of 2 minus 1.  That is bitwise
"and" that we are using may turn some lower order bits to 0,
decreasing quality of hash function.  So we probably need special
version of HASHSTATEMAKEFIXNUM for GCL.

> Does "pretend" do any conversion in FriCAS.
> My belief up to this point was that "pretend" is just like a type cast in C,
> i.e. no change in the underlying bit representation.

Cast in C may change bit representation.  "pretend" have no effect
at runtime.  At compile time beside making hole in type system,
"pretend" has similar effect to "@", that is it affects overload
resolution, favouring specified type.


> From d58bb59b814fe5243135e7b0b2e7f086445332b1 Mon Sep 17 00:00:00 2001
> From: Ralf Hemmecke <r...@hemmecke.org>
> Date: Thu, 19 Dec 2024 15:54:55 +0100
> Subject: use rem instead of positiveRemainder
> 
> ---
>  src/algebra/xhash.spad | 19 +++++--------------
>  1 file changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/src/algebra/xhash.spad b/src/algebra/xhash.spad
> index 6c101ea6..1ab3b690 100644
> --- a/src/algebra/xhash.spad
> +++ b/src/algebra/xhash.spad
> @@ -1,16 +1,7 @@
>  )abbrev domain XHASHTBL XHashTable
>  )if LiterateDoc
>  \documentclass{article}
> -\usepackage{url}
> -\usepackage{color}
> -\definecolor{bgcode}{rgb}{1,1,0.7}
> -
> -\usepackage{listings}
> -\lstdefinelanguage{spad}{basicstyle=\footnotesize\ttfamily,%
> -  numbers=left,firstnumber=last,backgroundcolor=\color{bgcode}}
> -\parindent0pt
> -\parskip\medskipamount
> -
> +\usepackage{literatedoc}
>  \begin{document}
>  \title{A hash table implementation via double hashing}
>  \author{Ralf Hemmecke}
> @@ -45,7 +36,7 @@ respectively. \texttt{VACANT} and \texttt{DELETED} should 
> not be part
>  of the key space.
>  
>  Let's assume that the type of the marker values is \texttt{Marker}.
> -Naively, and type correctly, we would have to use as representation
> +Naively, and type correctly, we would have to use a representation
>  like the following.
>  \begin{verbatim}
>  Union(Marker, Record(key: Key, entry: Entry))
> @@ -341,9 +332,9 @@ stored and thus the loops eventually terminate.
>              mk := getMKey(a, p)
>  
>          n: Z := numOfBuckets a
> -        h1: Z := (h k)::Z
> -        p: Z := positiveRemainder(h1, n) -- position in array
> -        h2: Z := 1 + positiveRemainder(h1, n-2)
> +        h1: Z := ((HASHSTATEMAKEFIXNUM(h k)$Lisp) pretend I)::Z -- h1>=0
> +        p: Z := h1 rem n -- position in array (p>=0)
> +        h2: Z := 1 + (h1 rem (n-2)) -- h2>=0
>          mk: MKey := getMKey(a, p)
>          deletedPosition?: Boolean := false
>          while not vacant? mk repeat
> -- 
> 2.43.0
> 


-- 
                              Waldek Hebisch

-- 
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to fricas-devel+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/fricas-devel/Z2RLTFTNmmylKdqr%40fricas.org.

Re: [fricas-devel] XHashTable performance

Reply via email to