On Sun, Aug 7, 2011 at 15:58, Rob Dixon <rob.di...@gmx.com> wrote: > On 07/08/2011 20:30, Shawn H Corey wrote: >> >> On 11-08-07 03:20 PM, shawn wilson wrote: >>> >>> It can be sped up (slightly) with an index. >> >> Indexes in SQL don't normally speed up sorting. What they're best at is >> selecting a limited number of records, usually less than 10% of the >> total. Otherwise, they just get in the way. >> >> The best you can do with a database is to keep the table sorted by the >> key most commonly used. This is different than an index. An index is an >> additional file that records the keys and the offset to the record in >> the table file. The index file is sorted by its key. > > Exactly. So to sort a database in the order of its key field all that is > necessary is to read sequentially through the index and pull out the > corresponding record. > > I would suggest that the OP could do this 'manually'. i.e. build a > separate index file with just the key fields and pointers into the > primary file. Once that is done the operation is trivial: even more so > if the primary file has fixed-length records (and if not I would like a > word with the person who decided on a 44G file that must be read > sequentially!). >
i really do think it could be done in perl pretty easy: my $idx; while( <> ) { $idx->{ $csv->field }->{ scalar{ $idx->{ $csv->field } ) } = $.; } then you have a nice data structure of your values and duplicates along with line numbers. you can then go and loop again and pull out your lines. i still think this is the wrong approach as it is in a db, should be in a db and should never have been put in a 44G flat file in the first place. but.... -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/