Hi list members, I have some problems with a script using hashes. I use hashes for years but never had this kind of problem before... I have a ascii file with 6.5 MB of data. This file is tokenized by Parse:Lex module. The tokens are then stored in a two level hash: $TokenHash{$TokenType}->{$TokenID}=$TokenValue. The file contains 536,332 tokens which will lead to 79 keys for %TokenHash. I'm evaluating the hash with two loops, one for each level. Due to that I need to move back and forth through the _sorted_ hash while being in the loop I can't use the built-in procedures like "foreach $key1 (keys %TokenHash)...". So I decided to use Tie::LLHash. Now I'm amazed by the memory consumption. The script uses up to 300MB for processing this small file which will lead to a 3.5 MB file at the end. I developed and tested my script with a 2K subset of the original file and therefore I haven't encountered the problem during tests. A simple "if (not exists $TokenHash{$TokenType}->{$TokenID}) {}" uses 110MB of memory. I encountered this when storing the elements into the hash was commented out. Just tokenization of the file uses 4M of memory. So in my opinion it's hash/hash operations related. In production the files to be processed will be up to several 100MB of size, so memory usage is really an issue for me. I also tried with "simple/built-in" hashes just to be sure that the module isn't the problem. But I got the same strange results. And I tried to use multi-dimension arrays, but they also use up to 50MB of memory.
Anything I need to consider? Anybody with the same experience? Perl: 5.6.1 Tie::LLHash: 1.002 Parse::Lex: 2.15 Best regards, Oliver