Re: [fricas-devel] XHashTable performance

Waldek Hebisch Thu, 19 Dec 2024 17:59:11 -0800

On Thu, Dec 19, 2024 at 11:25:14PM +0100, 'Ralf Hemmecke' via FriCAS - computer 
algebra system wrote:
> Hi Waldek,
> 
> > h(k) And max()$Singleinteger
> 
> > Due to your question about HASHSTATEMAKEFIXNUM and representation I
> > realised that there is potential trouble with GCL: in GCL
> > max()$SingleInteger is not a power of 2 minus 1.  That is bitwise "and"
> > that we are using may turn some lower order bits to 0, decreasing
> > quality of hash function.  So we probably need special version of
> > HASHSTATEMAKEFIXNUM for GCL.
> 
> With this GCL danger in mind, I tend to rather require the user to
> already provide a hash function that does not yield negative
> SingleInteger values. Modified patch attached.


OK, good.

> > If you mean changing docstring, you probably guessed that I have doubts
> > about this.  Our docstrings are _not_ specification.
> 
> Yes. Unfortunately.
> 
> > I normally try to give enough information in docstrings so that user
> > with general knowledge can "identify" given operation.
> 
> I understand that you'd like to have a simple docstring that is easy and
> quick to understand. What I do not understand is that you seem to be much
> against additional text that clarify any doubts a user might have.

First, I feel that docstrings are bad place to put extra explanations.
"The same" issue typically is shared by multiple functions, attaching
explanation to a single function makes it harder to find.  Attaching
it to multiple functions makes documentation bulky and harder to
maintain.  Also, repeating the same thing in many places puts extra
load on users, when reading a copy they do not know if this is
a copy of thing that they already read, or there is some twist
making it different.

Second, mixing documentation with code for me makes working with
code harder: when working on code I want to see code without any
distraction.  Short docstring may help with coding, as they
frequently contain crucial non-obvious information like argument
order (or more generaly role of arguments).  Longer would turn
into distraction.

Third, adding to documentation piece of info that person X
tried to find, but was absent does not lead to quality
documentation.  Namely, next person most likely will search
something different so we will get random collection of
facts, which will get bulky well before it attain resonable
coverage (queries are likely to follow Zips law, which
means that significant fraction of them will be unique).
Rather, there is need to select fundamental info including
things that users need to know, but do not ask (say because
they do not know that there is question to ask).  We need
to explain principles allowing to infer answers which are
not given explictely.  And we need to arrange docs so that
information is findable.

> I, for example, would not have been able to guess from the docstring of rem
> how the function behaves if negative arguments are involved. And I still
> don't know whether (if I now use my experiments made with SCBL) I get the
> same results if I run on a different LISP. Clearly, my wish is that Code
> written and documented in SPAD does behave the same no matter on which LISP
> FriCAS runs.

Well, we could probably add somewhere a paragraph about arithmentic.
People tend to assume that they know arithmentic, but divison with
remainder is frequently problematic.  Mathematically, division
with remainder is associated with modulo arithmetic and for this
nonnegative remainder and possibly symmetric remainder are "good" ones.
Unfortunately, hardware usually provides "wrong" remainder
and IIRC that is codified by Fortran and Lisp.  For me it
normally does not matter what exact rules are.  What matters is
that they are wrong.  More precisely, as long as we divide
nonnegative number d by positve one n, we get nonnegative remainder
r, that is 0 <= r < n.  When d or n are negative, we normally
get wrong result, which needs correction.

When we move to Euclidean domain R, there is an easy well-known
theorem: if there is unique remainder, then R is isomorphic to
a ring of univariate polynomials over a field.  If there are
two possibilities for remainder, than R is isomorphic to
integers.  So, in general, there are more than two possibilites
for remainder and I do not know any general condtions to
choose a good one.

In a bit more general spirit, we have an ideal and want
"canonical" choice of representative for residue classes.
For multivarite polynomials over a field Groebner bases
and reduction procedure give such choice (depending on
order).  Theortically Groebner bases can be extended to
much more general rings, but there are practical difficulties.

Anyway, in general there are troubles with remainder, and
unfortunatly, in case of integers, where there are no
mathematical troubles there is practical trouble caused
by hardware behaviour.

> Thanks for you explanation about "pretend".
> 
> That would show a difference between Aldor and FriCAS.
> Aldor obviously employs "immediate integers". The attached
> program gives output:
> 
> %>aldor -Fx -laldor int.as
> %>./int
> 1
> 1
> 
> 0
> 1
> 
> 3
> 5
> 9
> 17
> 
> The commented line in int.as would lead to a segmentation fault.
> 
> That is certainly different from FriCAS.
> 
> %%% (1) -> (1$SingleInteger) pretend Integer
> 
>    (1)  1
>                                                Type: Integer
> %%% (2) -> (1$Integer) pretend SingleInteger
> 
>    (2)  1
>                                                Type: SingleInteger
> 
> OK, no problem, because there is the dangerous "pretend", but one should
> be aware of that when dealing with Aldor and FriCAS at the same time.

That is one reason to to have 'qconvert' and using 'convert' in
the name: for Aldor it would need to do change of representation.

I am not sure why Aldor designers made their choice.  One possible
rationalle is that Lisp choice plays batter with dynamic typing.
Aldor designers could assume that static typing makes convertion
less problematic (one needs to do something to change type).
Personally, I do not have string opinion.  Aldor choice means
that there is some efficiency gain because small integers
directly use hardware instructions and "arbitrary" integers
do not need to check for case of small integer.  OTOH, when
procedure needs to handle arbitrary integers, but mainly
deals with small ones, Lisp choice means that small integers
will be more efficient.

-- 
                              Waldek Hebisch

-- 
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to fricas-devel+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/fricas-devel/Z2TPZ1j_rKkkvXuR%40fricas.org.

Re: [fricas-devel] XHashTable performance

Reply via email to