Re: [fricas-devel] parsing integer

Waldek Hebisch Fri, 10 Feb 2012 12:47:18 -0800

Serge D. Mechveliani wrote:
> 
> To my request on  fast parsing of Integer  Waldek Hebish  writes
> 
> > If you know that the string contains correct integer, then
> > READ_-FROM_-STRING(s)$Lisp  will work.  AFAIK there is no Spad level 
> > operation to do directly parse integer.  Probably the simplest Spad 
> > level way is:
> > integer(convert(parse(s)$InputForm)@SExpression)
> 
> > However, this will first run FriCAS parser (which you apparently want 
> > to avoid),
> > [..]
> 
> 
> I have done the following test
> (under  FriCAS-1.1.5  built from source  under GNU CLISP 2.48).
> 
> I wrote it as         I ==> Integer
>                             -- SingleInteger
>                       parseInt : String -> I
> and compared  
>     parseInt str ==  
>          pLisp str =  READ_-FROM_-STRING(str)$Lisp,
>          pSpad str =  my home made program given below, 
>          pGen  str =  integer(convert(parse(str)$InputForm)@SExpression)
> 
> on the nested loop of 10^5 digit strings of length 5. It shows:
> 
>    pSpad  is 2 times slower than  pLisp, 
>    pGen   is 8 times slower than  pSpad,
>      pSpad  is 10% faster on Integer than on SingleInteger (?),
>      pLisp  is equally fast for Integer and SingleInteger (?).
> 
> Anyway,  pSpad  is written in the standard.
> 
> Can you please comment the code?
> 
> -----------------------------------------------------------------------
> I ==> Integer
>       -- SingleInteger
> 
> )abbrev package PARSEI ParseInt
> ParseInt() : with
>                   parseInt : String -> I
>                   test     : ()     -> I
>  ==
>   add
>      parseInt(str : String) : I ==  
> 
>        -- Variants: 
>        -- READ_-FROM_-STRING(str) $ Lisp               
>        -- integer( convert(parse(str) $ InputForm)  @SExpression )
> 
>           b   :=  48     :: I
>           l   :=  # str  :: I
>           s   :=  0      :: I
>           ten :=  10     :: I
>           for i in (1 .. l) repeat 
>                             dNum := (ord(elt(str, i)) :: I) - b
>                             s    := (ten*s) + dNum 
>           s
> 
>      mapChar(js : List I) : List Character == 
>                             map(char, js) $ ListFunctions2(I, Character)
> 
>      test() ==              -- for benchmark
>        b  :=  48     :: I
>        nn :=  b + 9  :: I
>        s  :=  0      :: I
>        ns : List I :=  [] 
>  
>        for i1 in (b .. nn) repeat
>          for i2 in (b .. nn) repeat
>            for i3 in (b .. nn) repeat
>              for i4 in (b .. nn) repeat
>                for i5 in (b .. nn) repeat
>                      cs  := mapChar([i1,i2,i3,i4,i5])  :: List Character
>                      str := construct(cs) $ String
>                      i   := parseInt(str)
>                      s   := s + parseInt(str)   -- parseInt  used
>        s
>                       
>        -- For debugging:  ns := cons(i, ns)  and return  first(ns)
> ------------------------------------------------------------------------
> 
> I have the following questions.
> Can  pSpad  be improved?


If you have newest trunk the following version should be faster
(on 1.1.5 it may be slower):

     import Character
     zero := ord(char("0")) pretend SI

     parseInt(str : String) : I ==
          b := zero
          l   :=  # str  pretend SI
          s   :=  0      :: I
          ten :=  10     :: I
          for i in (1 .. l) repeat 
                            dNum := (ord(qelt(str, i)) pretend SI) - b
                            s    := (ten*s) + dNum
          s

Changes:
- using 'ord(char("0")) pretend SI' instead of 48 for readbility
  (and string value in variables for speed)
- subtraction of b is done using machine arithmetic (no danger
  of overflow here)
- elements of string are accessed using 'qelt' (in older versions
  it is slower than 'elt' but in the trunk has faster implementation).

On 2.33 GHz Core 2 using FriCAS compiled by Debian sbcl-1.40 I get
the following results (in seconds):
                                     your           newer
             READ_-FROM_-STRING      intParse      intParse

10^6*30          4.9                   4.07         1.87

10^6*6           1.51                  0.57         0.11


10^6*30 means 10^6 repetitions of 30 digit string, 10^6*6
is 10^6 repetitions of 6 digit string.  I used the following
interpreter functions to test:

do_par(s, n) == 
    r : Integer := 0; for i in 1..n repeat r := r + parseInt(s); r

do_par2(s, n) ==
    r : Integer := 0
    for i in 1..n repeat
       r := r + (READ_-FROM_-STRING(s)$Lisp pretend Integer)
    r

Note: the 0.11 timing is probably a bit spoiled due to overhead
of testing loop.  I did not use your 'test' function as it seems
to have much higher overhead.

> Should there be a standard library function   digitToInteger,
> in order to have in a user program some reliable Spad expression instead 
> of using `ord' and a strange magic constant of `48' in   
> `ord(elt(str, i)) - 48' ?

Maybe.  However instead of 48 you can use 'ord(char("0"))' which
is not so magic.  Concerning subtraction: I know about various
strange character sets, but for all that I know subtraction
works.  So for purity such function is nice, but of limited
use.  Also, to be fast it must be expanded inline (otherwise
call cost would dominate).  BTW: there is such function
at Lisp level, READ_-FROM_-STRING uses it (and its use seem
to be one (of several) reasons that Spad version is faster).

> Does  elt(str, i)  cost  O(1)  (like extraction from an array by index) ?

Yes.

> Is  ord(elt(str, i)) :: I  fast?

Reasonably.  In trunk 'qelt' is few times faster.

> Is  READ_-FROM_-STRING  written in C or assembly?
> 

sbcl READ_-FROM_-STRING is written in Lisp.  I am not sure about
Clisp, probably in C.

> As to me, both  pSpad and pLisp  are safficiently fast. 

Good to hear this.  But did you notice that pSpad and
AFAIK sbcl READ_-FROM_-STRING are O(n^2) in length of the number?

> I think that they are faster than  pGen  because they are only for the  
> digit strings.  They are not for analysing strings as  
> "(2 + 3  *4 ^5 *(6-7))  - 8",  
> they are not for parsing a generic arithmetical expression: 
> comparing priorities, setting parentheses, counting parentheses, finding 
> types, and interpreting operations or functions.
>

READ_-FROM_-STRING is quite general -- it can parse arbitrarily
complex expression, decides about (Lisp) types etc.  pGen can not
be faster than READ_-FROM_-STRING because pGen is READ_-FROM_-STRING
with extra overhead (which duplicates much of what READ_-FROM_-STRING
is doing anyway).  Both READ_-FROM_-STRING and pGen check that
the input in fact is an integer (pGen passess a string to
READ_-FROM_-STRING only if it verified that it is an integer).
pSpad trusts that string in fact is an integer.  OTOH checking
in pSpad could be quite cheap.

> If  Integer  will occure 3 times slower than  SingleInteger, I would 
> still use  Integer  -- except may be the cases when it is known for sure 
> that the data is within  SingleInteger. 
> SingleInteger  is often an error.

Of course SingleInteger should be used only if numbers are small
enough.  One could get much faster version of parseInt by
using SingleInteger for blocks (say of size 7) and treating
each block as a digit in base 10^7.

> May be, it is natural to add to the  Spad  library the function
> 
>               parseInteger,
> 
> implementing it as   READ_-FROM_-STRING(str) $ Lisp  ?
> Because for a user Spad program, it is probably better not no write 
> `$Lisp'. 
> Generally: it may be the  fastParse  category operation for a string 
> given in a certain avident restritive syntax for each appropriate type.

Hmm, it makes sense to add 'fromString' operation to few base
types.  Given speed and correctness issues I think that I would
rather add Spad version.

-- 
                              Waldek Hebisch
[email protected] 

-- 
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/fricas-devel?hl=en.

Re: [fricas-devel] parsing integer

Reply via email to