I need to handle a set of numbers where I want only the lowest 1000.
There will be no rhyme or reason as the data comes in. I will do some
calculations and take this total against the array or hash or ????? The
number of calculations will be tremendous and either I come up with a way to
handle the lowest 1000 efficiently or I have to write each line which will come
severl 100 million plus and pare it down from there.
I tried just a simple search and with the 1000 entries, it slowed
everything down to a crawl over time( just started at the front of the array
and worked way through). What the data will be is a total which is the key (
if same total comes through don't need, throw away) plus the supporting data
which will between 600 to 800 bytes per key. Not large, but when you start
processing the key and either do the move or ??? then starts to add up.
I have looked at some of the modules on CPAN:
Tree::Binary::Search
A Binary Search Tree for perl
Tree::RedBlack
This is an implementation of a red-black tree which is a type of
balanced binary tree (to the best of my knowledge that is, I am sure I am
simplifying it). Tree::Binary::Search does not attempt to balance the tree, so
if you are looking for a balanced tree, you might try this.
Tree::BPTree
This module implements a B+ tree, rather than a binary search tree. In
the authors own words, "B+ trees are balanced trees which provide an ordered
map from keys to values. They are useful for indexing large bodies of data.
They are similar to 2-3-4 Trees and Red-Black Trees. This implementation
supports B+ trees using an arbitrary n value." I am not quite sure exactly how
B+ Tree's work, but I am intrigued but this module. It seems to me to be well
tested module as well. If you are looking for a B+ Tree, I suggest giving it a
look.
Tree::M
In its own words, this module "implement M-trees for efficient
'metric/multimedia-searches'". From what I can tell, this module is not a
b-tree (binary search tree), but an m-tree, which is a tree optimized to handle
multi-dimensional (spatial) data, such as latitude and longitude. It is a
wrapper around a C++ library.
Tree::FP
In the authors own words, "Tree:FP is a Perl implmentation of the
FP-Tree based association rule mining algorithm (association rules == market
basket analysis). For a detailed explanation, see "Mining Frequent Patterns
without Candidate Generation" by Jiawei Han, Jian Pei, and Yiwen Yin, 2000.
Contrarywise, most books on data mining will have information on this
algorithm." While it sounds like a very cool thing, it is not a binary search
tree.
Tree::Ternary
This is a ternary search trees, as opposed to a binary search tree. Similar,
but different. If two nodes isn't enough for you, I suggest taking a look at
this. These is also an XS based implementation Tree::Ternary_XS.
I read and looked at the docs, but nothing jumped out at me as that
this might solve what I am trying to do. As part of the processing, it must
remove obviously the highest value when it goes over 1000 also.
Any ideas or thoughts would be greatly appreciated.
Thanks!
Wags ;)
*******************************************************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
*******************************************************
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>