I ran into a performance wall in GAP while handling lookup tables. By telling the story I hope to either get advice on how to handle the problem (although I found a way out and explain it later), or point the maintainers to a possible improvement in GAP's guts.
The story begins with an intricate structure, whose nodes are records uniquely labelled by strings, with several string pointing to other records. Some of these pointers are labels of nonexistent nodes. These nodes are on a list N, ordered by their labels as strings. Sizes: just under 1,000,000 nodes, and about 2,500,000 labels. (BTW: the nodes are elements of a group I am investigating, so it is quite natural to work on it in GAP) I had to traverse the structure, following these pointers. So I needed a lookup table to convert labels into records. First, obvious solution: produced an ordered list L of all labels (that was fast!). Although L is longer than N, it is true that the label of N[i] is L[i], so, given a label s, the corresponding record is N[Position(L,s)]. Also, since L is ordered, lookup should be fast. That seems fine, so I tried a traversal, which I know takes linear time on the number of links. It took forever. I had the routine print a progress report, and it was clearly crawling. Then I tried an alternative: keep L, and replace in N each label by its corresponding index in L. Tried it in place and tried it generating a new list. In both cases it crawled. At this point, it is worth mentioning that the hardware I use is up to the task: fast 64bit processors, enough memory (I allowed GAP 18GB, but it never reached 8). Last chance: use a record as an associative array. That is, create a record R such that, for every label s, R.(s) = Position(L,s). Looking at the GAP source was encouraging, as records are implemented using hashing. Filling up R is easy: for every index i on L, R.(L[i]):=i. Fortunately, I had a progress meter. It started very fast, up to 300,000. Then it petered out to more that a second per index. So, that is the story. Could I have done any better or could the record implementation get some revamping? I solved my problem by going out of GAP. First I printed the structure to a file (2.5GB), and then processed it through a small perl program to do the transformation I tried before. The logic is the same as I tried with R, usig a perl hash. In less than 60 seconds it produced a GAP-readable file containing L and the transformed N. That really solved my problem. After adapting the traversal routines, now GAP processes N any which way I need very quickly. Still, I spent quite sometime trying a pure GAP solution; maybe GAP could borrow the hash implementation from perl. Cheers, Arnaldo Mandel Departamento de Ciência da Computação - Computer Science Department Universidade de São Paulo, Bra[sz]il [EMAIL PROTECTED] Talvez você seja um Bright http://the-brights.net Maybe you are a Bright. _______________________________________________ Forum mailing list [email protected] http://mail.gap-system.org/mailman/listinfo/forum
