Re: [fricas-devel] Random choice

Waldek Hebisch Wed, 07 Oct 2015 20:20:05 -0700

Ralf Hemmecke wrote:
> 
> On 10/07/2015 12:26 PM, Waldek Hebisch wrote:
> >> If you are satisfied with random integers, then that must be stated
> >> in the category documentation.
> > 
> > You mean that we should state explicitely that random elements need
> > not exhaust the domain?
> 
> Yes, why not? If (as an implementor) I have such information, I know
> that I don't need to bother with exhausting the domain.


I hoped that properties given in my decription was vague enough
so that implementer could infer that there is a lot of freedom,
in particular that there is no promise to exhaust domain the domain.
 
> >> In fact, I more and more get the impression that you don't actually
> >> want "RandomChoice". What you want is a stream of elements of %
> >> that do not repeat. Am I wrong?
> 
> > It would be nice if we could get non-repetition for free.
> 
> Well, if we had "Generator(X)" in SPAD, that would make it easier to
> implement such non-repeating stream. Maybe one could also use Stream(X)
> or even return a function of type ()->X that returns the "next" element.
> 
> > But it seems that user have more info how to handle possible 
> > repetition.
> 
> Yes, maybe. But a function that returns some element in a somewhat
> "uniform" distribution, would also have to take care of repeating elements.

I do not understand what you mean: if we generate uniformly
distributed and independent elemets then we automatically
have posivite probability of repetition.  No need to extra
introduce them.  As I wrote, in case when repeated elements
are not wanted the user can remove them.  However, some
users may tolerate them, so why spent extra effort when
this is not needed.

Let me say this differently: capability of generating somewhat
random elements depends on specific domain.  It is not possible
to implement it without domain-specific code.  OTOH removal
of duplicates can be dane in generic way and I do not see how
domain could improve it.  RandomChoice is intended to cover
just the special part, leaving generic stuff to packages.

> > OTOH merely having non-repeating sequence is not enough.
> 
> Can you say why? You mentioned gcd and factorization. But I don't
> understand. Primality testing would probably a different business.
 
In common case we have a finite bad set S.  We need to find
some element which does not belong to S.  Of course we hope
that "random" element is not in S.  However, merely replaying
some fixed sequence without repetition may lead to bad behavoiur.
Namely, if S is large and agrees with initial segment of our
sequence, then we may wait quite long for first element
not in S.  Similaraly, if elements are not random enough
we may get a lot of elements from S.

> > In practice we enlagre n as geometric sequence, so we need enough
> > randomness.
> 
> Waldek, maybe I am the only one here on the list that does not get the
> real purpose of your intention from the docstring that you have given.
> Can you give more concrete examples of where you would like to apply
> randomElement?

Look at:

https://en.wikipedia.org/wiki/Schwartz-Zippel_lemma

and think what you really need to use this lemma.  You will
see that we really do not care about exact set from which
we take the elements.  All we need is inequality p(a) <= 1/n
for each values that may appear.

> >> Well. That somehow sounds like you want to easily throw away "unlucky
> >> primes".
> >
> > That separate topic.
> 
> Really? Well, for sure "deciding unluckyness" must be done by the user,
> but otherwise (at least for gcd) it would be enough if randomElement
> does not return the same (now known to be unlucky) element over and over
> again.

Getting the same element is extremally unlikely.  Anyway, if
needed user can eliminate duplicates in generic way.  Also,
avoiding unlucky primes may involve some extra effort.
For example guessing package in general does not know if
given prime is unlucky.  But after taking random prime
we take also evaluatin point and at the end we may discover
that the whole tuple (prime, points) is unlucky.  If for
given prime we get too many unlucky points we move to next
prime.

And to make things concrete: guessing packeage can not
tolerate duplicates, so it keeps list of primes that was
used and checks if value is on this list.  AFAICS we
can not avoid storing already generated elements, and
guessing package needs to store them anyway.  So trying to
remove duplicates in 'random' would just increase memory
use.  ATM check in guessing packeage is done via linear
search and in principle binary search or hashing could
be faster.  However current check is good enough.

> 
> I still don't get exactly what the purpose of your function is, but let
> me try in documenting it and maybe I would change the signature a bit.
> 
> randomElement: Integer -> Union(() -> %, failed)
>   ++ randomElement(n) randomly selects n elements from the domain
>   ++ and returns a function "random" so that random() chooses randomly
>   ++ among those n preselected elements. If the underlying domain does
>   ++ not contain n elements then randomElement(n) returns failed.
> 
> That would be something I understand as an implementor.
> Implementationwise, that does (of course) not mean, that the call
> "randomElement(n)" first computes a *list* of n elements. Such details
> are left open.

I do not see why you want to return a function?  From my point
of view this is just useless complication.  Also what
"randomly selects n elements" means?  To have random choice
we need a probability distribution and the whole point of
RandomChoice is that we do not have any natural distribution.

Given that we do not want the intermediate function we would get the
following:

++ randomElement(n) first selects n distinct elements from the
++ domain and then randomly chooses one of them with uniform
++ distribution. If the underlying domain does
++ not contain n elements then randomElement(n) returns failed

I also wrote that elements should be distinct because otherwise
one gets completely unnecessary complications trying to
estimate probabilities.  However, this splitting into first
choosing set and than selectiong element from the set does
not look clearer for me than my formulation.  Note: having
_both_ is certainly clearer than just one.  In my formulation
reader must infer that logically there is selection of
a set.  You put stress on selecting set, but than reader must
infer that in fact this selection is only at logical level. 

Your formulation is more precise (provided that we say that
n elements are distinct).  OTOH I deliberatly did not want
to make very precise statement.  Namely, I do not care
of set is slightly bigger, form probability point of view
this is only better.  Also it does not matter is probabilities
are exatly uniform on choosen set, what matter is that they
are low enough.  Let me add that I did not make statement
that probablity of given element is <= 1/n because in general
it hard to say samething exact abot probabilities from
pseudo-random generator.  If generator is good then there
will be no observable difference.  But some tiny difference
may in principle appear due to combination if imperfect
randomness and way we produce elements.

However, if you feel that string claim about probability
will be correctly understood, then we can make it.

BTW: I have now checked one argument 'random'.  We have two

random : PositiveInteger -> ...

but they are in packages, so there will be no conflict.
The is one argument 'random' in PermutationGroup, but this
one have different argument type.  There is conflict with
'random(n)' in integer domains, because our new 'random'
returns union, while random in integer returns integers.
We could resolve this by specifying return type.  I am
not sure if we want to do this: there are of order few
tens calls to 'random(n)', some will pick correct version
because type is already determined.  But in say 20 places
we would have to use explicit '@Integer'.

-- 
                              Waldek Hebisch

-- 
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/fricas-devel.
For more options, visit https://groups.google.com/d/optout.

Re: [fricas-devel] Random choice

Reply via email to