Hi Scott,

        I tried your suggestion of turning smallList into an associative 
array with the index for each element equal to the text I'm looking for 
in bigList.  I think I must have misunderstood your suggestion because 
the handler runs much slower than previously, perhaps because I've got 
it asking for the keys of smallList for every line of bigList.  Here's 
what I tried.

-- Note. smallListArray array is an array made out of the original 
smallList variable

repeat for each line i in bigList
     if item 6 of i  keys(smallListArray)
     then
       put i into hitList[item 6 of i]
     end if
   end repeat


Message: 3
Subject: Re: Comparing big lists
Date: Sat, 27 Apr 2002 16:10:42 -0400
From: Gregory Lypny <[EMAIL PROTECTED]>
To: "MetaCard List" <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]

Thanks for the suggestion, Scott.  I'll give it a shot.  I've also tried
looping over the lines of bigList (i.e., a nested repeat), simply using
the 'in' operator:  if x is in y, then...  It takes about 6 minutes on a
modest (300 mHz) iBook running OS X, but I'm hoping for an improvement,

      Regards,

           Greg

On 27/4/2002 12:08 PM, [EMAIL PROTECTED] wrote:

Message: 2
Date: Fri, 26 Apr 2002 12:48:53 -0600 (MDT)
From: Scott Raney <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Re: Comparing big lists
Reply-To: [EMAIL PROTECTED]

On: Thu, 25 Apr 2002 Gregory Lypny <[EMAIL PROTECTED]> wrote:

   Thought I would pick your brains on the topic of comparing two big
lists.  Both are tab delimited.  bigList has about 100,000 lines and
6 items (columns) per line.  smallList is about 15,000 lines and 2
items per line.  I want to identify the lines in bigList in which
the third item is the same as the second item in a line in
smallList, and then pull out the intersection.  I used something
like this, which works fine.

     set the itemDelimiter to tab
               repeat for each line j of smallList
                    put lineOffset(item 2 of j, bigList) into thisLine
                    if thisLine is not 0 then put j & tab & \
                         line thisLine of bigList  & return after 
mergedList
               end repeat
     delete last character of mergedList  -- Get rid of the trailing 
Return

Using the lineOffset function seemed the obvious choice to me, but I'm
also interested in other approaches.

LineOffset on such a big variable is going to be pretty expensive.
Another option would be to us split to build an array out of smallList
and the loop over each line in big list and see if there is an array
index for it.  Split takes awhile and will use up a good bit of
memory, but makes the lookups *much* faster.  You could save some of
that space by building up an array of just the relevant items in one
list or the other by looping over the lines and creating one array
index for each.
  Regards,
    Scott

     Regards,
         Greg

_______________________________________________
metacard mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/metacard

Reply via email to