On Tue, Aug 30, 2022 at 06:01:08PM +0800, Qian Yun wrote:
> I wonder if there are simpler issues that can be separated from
> this grand plan, so that me and others can help with that.

Let me first explain what takes most time.  Tim Daly says
that Axiom codebase represents more than 140 man years of
work.  In original source there is about 350 thousends of
wc lines in code files (I do no write LOC, because LOC
count should be done removing comments).  Resonable coder
should be able to write about 100 lines daily, really
fast one maybe 2-3 times more.  Assuming 200 working
days in a year we get 17.5 man years of coding.  So
why so much time to do this.  This 100 lines already
includes debugging.  One thing that take time is documentation.
But main factor is research.  Developers must invent
new methods.  To do this they must do experiments and
measurements.  Partially written code may be discarded
because there is better alternative.

To illustrate this let me mention few examples:
- in case of ffact.spad IIRC corrctly I coded it better than
  100 lines per day.  Before I started coding theory was
  mostly worked out so I know exactly how to go.
- in case cyclo.spad also coding was fast thank to theory
  in Arnold and Monagan paper (in fact googling for hints
  and reading literature took more time than coding).
- in case of intpar.spad coding also was resonably fast,
  but before coding was rather long period of analysis
  and planning.
- in case of rdeefx.spad there was quite long period when
  I just looked at problem from theoretical point of view.
  Coding was slowed down because at first I got too
  ambitious and tried too complicated methods for some
  substeps (now rdeefx.spad handles main cases but still
  need work to cover some important special cases).

OK, what needs to be done:
1. we need packages with interface like ModularAlgebraicGcdTools2
  and ModularAlgebraicGcdTools3 but faster.  Expected use case
  for both is dense, so polynomials should be kept in arrays.
  Already a single routine, like polynomial multiplications
  would be good step forward.  Once we figure out how to do
  fast multiplications other routines are likely to be similar
  or can use multiplication routine.
2. we need package like ModularFactorizationTools1, but
  for algebraic extention fields.  Main operation, that is
  multiplication is the same as in ModularAlgebraicGcdTools2.
  In fact, the two tasks are closely related.
3. in case of multiple algebraic extentions it is not clear
  if doing arithmetic directly can be really fast, all
  schemes seem give extra power factor in worst case.  As
  an alternative one could search for primitive element.
  In fact, finding primitve elements should be easy, but
  there is cost of changing representation.
4. For factoring and GCD we need fast Hensel lifting.  Bernardin
  in his PhD thesis describe scheme in two variables which
  should run resonably well.  So probably we should write
  routine using his scheme (but some variation may give
  improvement).
5. Magma literature says that for moderately large extentions
  of small finite field they use Zech logaritms.  We could
  try similar thing, that is write version of routines
  from previous points based on Zech logaritms.
6. ffact.spad currently uses sligtly better algorithm than
  ddfact.spad and ffact.spad.  Version over Z_p for small p
  has significant low level advantage.  But generic version
  has some disadvantage compared to ddfact.spad and should
  be similar to ffact.spad.  It would be simpler to just
  work with ffact.spad (which is single generic code that
  can be specialized to fast verisions) instead of several
  packages.  But we should do bencharking, to make sure
  that there is no unexpected slowdown (and possibly
  improve ffact.spad if there is).  So interesting cases
  are Z_p for large p (so that we can not use machine
  sized integers) comparing to ddfact.spad and factorization
  over extention fields comparing to ffact.spad.
7.  Part of work on numerics is checking accuracy of
  routines that we have.  As I mentioned, AiryAi had problem
  for positive argument, so it is replaced by new version
  which still have problems, but should be better than old
  one.  One issue now it to find out if/where our complex
  polygamma works.  I have code to compute Bessel functions
  and polygamma to resonably high accuracy.  This code
  is quite slow so not appropriate as general routine.
  But if somebody wants to look at accuracy of existing code,
  then I can share it and it can serve os source of good
  values.

My example with cyclo.spad indicate a class of possiblities:
find promising and well described algorithm in literature
that is not implemented in FriCAS and add it.

On less algorithmic ground: AFAICS current method of computing
Laplace transform and inverse Laplace transform basically
try to match to known cases.  Here general hypergeometric
functions help, making part of it more systematic, but
still, this is mostly business of matching to special cases.
It would help if somebody implemented more cases for Laplace
and inverse Laplace transform.  And possibly also for
Fourier and Z transform (there is old .input file by
Alasdair McAndrew but I would need convertion to Spad
constructs).

Let me add that usual form of contribution is fixing bugs.
My policy is to fix easily fixable bugs as soon as posible
(known bugs can mask other problems and if left unfixed
can lead to lowering quality), which means that fixes
for known bugs are probably not so easy.  OTOH you can
look for unkown bugs,  some of them may be easily fixable.
When searching for bugs one possiblity is to repeat
what Vladimir Bondarenko did.  Start from large collection
of expression, possibly randomly generated, possibly
random variation of known formula and use them as input
to the system.  For indefinite integration things are
easy: derivative of an expression is supposed to be
integrable and up to additive constant should agree with
original expression.  For factoring one could multiply
known factors.  Getting good scheme requires some
invention, for example my first generator of random
expressions generated too deep expressions, variant
inspired by Facebook folks is much better.  But one
can try other ways to bias generated expression towards
some interesting properties.

Let me add that first obstacle in random testing is
to have some criterion for validity of answers.  When
testing routines using pseudo-random number simple
approch is to run the same input several times.  In
many cases, when answer differs this indicates a bug
(there are cases when output in naturally non-unique
like in case of primitieve elements).

Another thing is fuzzing: one deliberately introduces changes
(probably bugs).  Tests that give different result after
change (detect change) are good tests and candidates to
add to testsuite.

-- 
                              Waldek Hebisch

-- 
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/fricas-devel/20220902235418.GB7631%40fricas.math.uni.wroc.pl.

Reply via email to