On Mon, Apr 22, 2013 at 1:21 PM, Wendy Lin <[email protected]> wrote:
> Does anyone have a working algorithm to sort a very large array (>10
> million entries) of compound variables or filter whole entries?

In that case $ print -C compoundvar # is your "friend" because it
prints a single compound variable as a single line.
Using that you can use a small loop to read the compound variables
from stdin, add one or more sort keys in front of it and then let AST
"sort" do it's job (the "sort" utility scales a lot better than using
typeset -m in this case for large amount of data). The same can be
done using AST "grep" to filter compound variables based on the key
given.
After sorting is done the keys should be removed and the compound data
can be read via $ read -C comvar # back into the array...

Example:
-- snip --
# create a stream of compound variables and
# send it to file 'xxx'
print -u2 '# creating test data set...'
(
        integer i

        seq 2000 | while read i ; do
                (( i=i % 100 ))
                compound c
                integer c.i
                typeset c.s=$(printf "hello_%x" i)
                float c.a
                (( c.i=i ))
                (( c.a=sin(i) ))
                print -v c
        done >'xxx'
)

# read stream of compound variables
# for each compound variable we add two sort keys
# seperated by a <TAB> character, followed by the
# plain compound variable printed in a single line
# via $ print -C comvar #
print -u2 '# adding sort keys...'
(
        compound comvar

        while read -C comvar ; do
                # print sort key
                printf 'KEY1=%d\t' comvar.i
                printf 'KEY2=%x\t' $((comvar.i%16))
                print -C comvar
        done <'xxx' >'xxx_unsorted'
)


print -u2 '# sorting...'
# sort data, either via KEY1 or KEY2
#sort -t $'\t' -k1 <'xxx_unsorted' >'xxx_sorted'
sort -t $'\t' -k2 <'xxx_unsorted' >'xxx_sorted'


# now read the sorted data, remove the KEY1/KEY2 sort
# keys and store the compound variables in array "sar"
print -u2 '# removing sort keys and storing data...'
(
        compound -a sar
        integer num_sar=0
        typeset dummy_key1 dummy_key2 s

        while IFS=$'\t' read -r dummy_key1 dummy_key2 s ; do
                # bug: can't use sar[num_sar++] here
                read -C sar[num_sar] <<<"$s"
                (( num_sar++ ))
        done <'xxx_sorted'
        
        print -v sar
)

exit 0
-- snip --

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [email protected]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users

Reply via email to