[fricas-devel] parsing integer

Serge D. Mechveliani Fri, 10 Feb 2012 10:07:40 -0800

To my request on  fast parsing of Integer  Waldek Hebish  writes

> If you know that the string contains correct integer, then
> READ_-FROM_-STRING(s)$Lisp  will work.  AFAIK there is no Spad level 
> operation to do directly parse integer.  Probably the simplest Spad 
> level way is:
> integer(convert(parse(s)$InputForm)@SExpression)


> However, this will first run FriCAS parser (which you apparently want 
> to avoid),
> [..]


I have done the following test
(under  FriCAS-1.1.5  built from source  under GNU CLISP 2.48).

I wrote it as         I ==> Integer
                            -- SingleInteger
                      parseInt : String -> I
and compared  
    parseInt str ==  
         pLisp str =  READ_-FROM_-STRING(str)$Lisp,
         pSpad str =  my home made program given below, 
         pGen  str =  integer(convert(parse(str)$InputForm)@SExpression)

on the nested loop of 10^5 digit strings of length 5. It shows:

   pSpad  is 2 times slower than  pLisp, 
   pGen   is 8 times slower than  pSpad,
     pSpad  is 10% faster on Integer than on SingleInteger (?),
     pLisp  is equally fast for Integer and SingleInteger (?).

Anyway,  pSpad  is written in the standard.

Can you please comment the code?

-----------------------------------------------------------------------
I ==> Integer
      -- SingleInteger

)abbrev package PARSEI ParseInt
ParseInt() : with
                  parseInt : String -> I
                  test     : ()     -> I
 ==
  add
     parseInt(str : String) : I ==  

       -- Variants: 
       -- READ_-FROM_-STRING(str) $ Lisp               
       -- integer( convert(parse(str) $ InputForm)  @SExpression )

          b   :=  48     :: I
          l   :=  # str  :: I
          s   :=  0      :: I
          ten :=  10     :: I
          for i in (1 .. l) repeat 
                            dNum := (ord(elt(str, i)) :: I) - b
                            s    := (ten*s) + dNum 
          s

     mapChar(js : List I) : List Character == 
                            map(char, js) $ ListFunctions2(I, Character)

     test() ==              -- for benchmark
       b  :=  48     :: I
       nn :=  b + 9  :: I
       s  :=  0      :: I
       ns : List I :=  [] 
 
       for i1 in (b .. nn) repeat
         for i2 in (b .. nn) repeat
           for i3 in (b .. nn) repeat
             for i4 in (b .. nn) repeat
               for i5 in (b .. nn) repeat
                     cs  := mapChar([i1,i2,i3,i4,i5])  :: List Character
                     str := construct(cs) $ String
                     i   := parseInt(str)
                     s   := s + parseInt(str)   -- parseInt  used
       s
                      
       -- For debugging:  ns := cons(i, ns)  and return  first(ns)
------------------------------------------------------------------------

I have the following questions.
Can  pSpad  be improved?

Should there be a standard library function   digitToInteger,
in order to have in a user program some reliable Spad expression instead 
of using `ord' and a strange magic constant of `48' in   
`ord(elt(str, i)) - 48' ?

Does  elt(str, i)  cost  O(1)  (like extraction from an array by index) ?
Is  ord(elt(str, i)) :: I  fast?
Is  READ_-FROM_-STRING  written in C or assembly?

As to me, both  pSpad and pLisp  are safficiently fast. 
I think that they are faster than  pGen  because they are only for the  
digit strings.  They are not for analysing strings as  
"(2 + 3  *4 ^5 *(6-7))  - 8",  
they are not for parsing a generic arithmetical expression: 
comparing priorities, setting parentheses, counting parentheses, finding 
types, and interpreting operations or functions.

If  Integer  will occure 3 times slower than  SingleInteger, I would 
still use  Integer  -- except may be the cases when it is known for sure 
that the data is within  SingleInteger. 
SingleInteger  is often an error.

May be, it is natural to add to the  Spad  library the function

              parseInteger,

implementing it as   READ_-FROM_-STRING(str) $ Lisp  ?
Because for a user Spad program, it is probably better not no write 
`$Lisp'. 
Generally: it may be the  fastParse  category operation for a string 
given in a certain avident restritive syntax for each appropriate type.

Thanks,

------
Sergei
[email protected]

-- 
You received this message because you are subscribed to the Google Groups 
"FriCAS - computer algebra system" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/fricas-devel?hl=en.

[fricas-devel] parsing integer

Reply via email to