Serge D. Mechveliani wrote:
>
> To my request on fast parsing of Integer Waldek Hebish writes
>
> > If you know that the string contains correct integer, then
> > READ_-FROM_-STRING(s)$Lisp will work. AFAIK there is no Spad level
> > operation to do directly parse integer. Probably the simplest Spad
> > level way is:
> > integer(convert(parse(s)$InputForm)@SExpression)
>
> > However, this will first run FriCAS parser (which you apparently want
> > to avoid),
> > [..]
>
>
> I have done the following test
> (under FriCAS-1.1.5 built from source under GNU CLISP 2.48).
>
> I wrote it as I ==> Integer
> -- SingleInteger
> parseInt : String -> I
> and compared
> parseInt str ==
> pLisp str = READ_-FROM_-STRING(str)$Lisp,
> pSpad str = my home made program given below,
> pGen str = integer(convert(parse(str)$InputForm)@SExpression)
>
> on the nested loop of 10^5 digit strings of length 5. It shows:
>
> pSpad is 2 times slower than pLisp,
> pGen is 8 times slower than pSpad,
> pSpad is 10% faster on Integer than on SingleInteger (?),
> pLisp is equally fast for Integer and SingleInteger (?).
>
> Anyway, pSpad is written in the standard.
>
> Can you please comment the code?
>
> -----------------------------------------------------------------------
> I ==> Integer
> -- SingleInteger
>
> )abbrev package PARSEI ParseInt
> ParseInt() : with
> parseInt : String -> I
> test : () -> I
> ==
> add
> parseInt(str : String) : I ==
>
> -- Variants:
> -- READ_-FROM_-STRING(str) $ Lisp
> -- integer( convert(parse(str) $ InputForm) @SExpression )
>
> b := 48 :: I
> l := # str :: I
> s := 0 :: I
> ten := 10 :: I
> for i in (1 .. l) repeat
> dNum := (ord(elt(str, i)) :: I) - b
> s := (ten*s) + dNum
> s
>
> mapChar(js : List I) : List Character ==
> map(char, js) $ ListFunctions2(I, Character)
>
> test() == -- for benchmark
> b := 48 :: I
> nn := b + 9 :: I
> s := 0 :: I
> ns : List I := []
>
> for i1 in (b .. nn) repeat
> for i2 in (b .. nn) repeat
> for i3 in (b .. nn) repeat
> for i4 in (b .. nn) repeat
> for i5 in (b .. nn) repeat
> cs := mapChar([i1,i2,i3,i4,i5]) :: List Character
> str := construct(cs) $ String
> i := parseInt(str)
> s := s + parseInt(str) -- parseInt used
> s
>
> -- For debugging: ns := cons(i, ns) and return first(ns)
> ------------------------------------------------------------------------
>
> I have the following questions.
> Can pSpad be improved?
If you have newest trunk the following version should be faster
(on 1.1.5 it may be slower):
import Character
zero := ord(char("0")) pretend SI
parseInt(str : String) : I ==
b := zero
l := # str pretend SI
s := 0 :: I
ten := 10 :: I
for i in (1 .. l) repeat
dNum := (ord(qelt(str, i)) pretend SI) - b
s := (ten*s) + dNum
s
Changes:
- using 'ord(char("0")) pretend SI' instead of 48 for readbility
(and string value in variables for speed)
- subtraction of b is done using machine arithmetic (no danger
of overflow here)
- elements of string are accessed using 'qelt' (in older versions
it is slower than 'elt' but in the trunk has faster implementation).
On 2.33 GHz Core 2 using FriCAS compiled by Debian sbcl-1.40 I get
the following results (in seconds):
your newer
READ_-FROM_-STRING intParse intParse
10^6*30 4.9 4.07 1.87
10^6*6 1.51 0.57 0.11
10^6*30 means 10^6 repetitions of 30 digit string, 10^6*6
is 10^6 repetitions of 6 digit string. I used the following
interpreter functions to test:
do_par(s, n) ==
r : Integer := 0; for i in 1..n repeat r := r + parseInt(s); r
do_par2(s, n) ==
r : Integer := 0
for i in 1..n repeat
r := r + (READ_-FROM_-STRING(s)$Lisp pretend Integer)
r
Note: the 0.11 timing is probably a bit spoiled due to overhead
of testing loop. I did not use your 'test' function as it seems
to have much higher overhead.
> Should there be a standard library function digitToInteger,
> in order to have in a user program some reliable Spad expression instead
> of using `ord' and a strange magic constant of `48' in
> `ord(elt(str, i)) - 48' ?
Maybe. However instead of 48 you can use 'ord(char("0"))' which
is not so magic. Concerning subtraction: I know about various
strange character sets, but for all that I know subtraction
works. So for purity such function is nice, but of limited
use. Also, to be fast it must be expanded inline (otherwise
call cost would dominate). BTW: there is such function
at Lisp level, READ_-FROM_-STRING uses it (and its use seem
to be one (of several) reasons that Spad version is faster).
> Does elt(str, i) cost O(1) (like extraction from an array by index) ?
Yes.
> Is ord(elt(str, i)) :: I fast?
Reasonably. In trunk 'qelt' is few times faster.
> Is READ_-FROM_-STRING written in C or assembly?
>
sbcl READ_-FROM_-STRING is written in Lisp. I am not sure about
Clisp, probably in C.
> As to me, both pSpad and pLisp are safficiently fast.
Good to hear this. But did you notice that pSpad and
AFAIK sbcl READ_-FROM_-STRING are O(n^2) in length of the number?
> I think that they are faster than pGen because they are only for the
> digit strings. They are not for analysing strings as
> "(2 + 3 *4 ^5 *(6-7)) - 8",
> they are not for parsing a generic arithmetical expression:
> comparing priorities, setting parentheses, counting parentheses, finding
> types, and interpreting operations or functions.
>
READ_-FROM_-STRING is quite general -- it can parse arbitrarily
complex expression, decides about (Lisp) types etc. pGen can not
be faster than READ_-FROM_-STRING because pGen is READ_-FROM_-STRING
with extra overhead (which duplicates much of what READ_-FROM_-STRING
is doing anyway). Both READ_-FROM_-STRING and pGen check that
the input in fact is an integer (pGen passess a string to
READ_-FROM_-STRING only if it verified that it is an integer).
pSpad trusts that string in fact is an integer. OTOH checking
in pSpad could be quite cheap.
> If Integer will occure 3 times slower than SingleInteger, I would
> still use Integer -- except may be the cases when it is known for sure
> that the data is within SingleInteger.
> SingleInteger is often an error.
Of course SingleInteger should be used only if numbers are small
enough. One could get much faster version of parseInt by
using SingleInteger for blocks (say of size 7) and treating
each block as a digit in base 10^7.
> May be, it is natural to add to the Spad library the function
>
> parseInteger,
>
> implementing it as READ_-FROM_-STRING(str) $ Lisp ?
> Because for a user Spad program, it is probably better not no write
> `$Lisp'.
> Generally: it may be the fastParse category operation for a string
> given in a certain avident restritive syntax for each appropriate type.
Hmm, it makes sense to add 'fromString' operation to few base
types. Given speed and correctness issues I think that I would
rather add Spad version.
--
Waldek Hebisch
[email protected]
--
You received this message because you are subscribed to the Google Groups
"FriCAS - computer algebra system" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/fricas-devel?hl=en.