From your description, I'm assuming this list lives in central storage
rather than on DASD. This sounds like a good candidate for a hash table,
assuming you have some idea of the maximum number of entries it's likely
to have to accommodate.

--

Regards, Gord Tomlin
Action Software International
(a division of Mazda Computer Corporation)
Tel: (905) 470-7113, Fax: (905) 470-6507

Patrick Roehl wrote:
I’m looking for advice on how to handle a potentially large list of data.
The list is comprised of 4-byte entries and the application needs to know
if an incoming item is already present or is new to the list.  This is the
approach that is currently in use and that I’d like to improve upon:

1) Perform a binary search and process no further if the item is already
present

2) If there is not enough room to add a new entry, allocate a new storage
area 1.5 times the size of the old area, MVCL the existing data to the new
area, and free the old area.

3) The binary search from step 1 indicates where the new entry should be
inserted.  To add the entry to the list, individual entries are moved one
at a time (to avoid overlapping moves) to open a spot in the list for the
new entry.

This old process has worked well for fairly small lists but I’d like
opinions on how to improve this process for large lists (say, a million or
more).

Using SORT is not an option because of the multi-threaded online
environment (it’s running in CICS).

The list is only used by a single process that handles data as it
arrives.  To process correctly, it must be able to determine immediately
if the data being presented has already been processed.  When all of the
incoming data for that process has been handled the list is discarded.

Speed and efficiency are important.  All suggestions regarding logic and
coding techniques are appreciated!


Reply via email to