Re: Sorting an extremely LARGE file

shawn wilson Sun, 07 Aug 2011 13:54:19 -0700

On Sun, Aug 7, 2011 at 15:58, Rob Dixon <rob.di...@gmx.com> wrote:
> On 07/08/2011 20:30, Shawn H Corey wrote:
>>
>> On 11-08-07 03:20 PM, shawn wilson wrote:
>>>
>>> It can be sped up (slightly) with an index.
>>
>> Indexes in SQL don't normally speed up sorting. What they're best at is
>> selecting a limited number of records, usually less than 10% of the
>> total. Otherwise, they just get in the way.
>>
>> The best you can do with a database is to keep the table sorted by the
>> key most commonly used. This is different than an index. An index is an
>> additional file that records the keys and the offset to the record in
>> the table file. The index file is sorted by its key.
>
> Exactly. So to sort a database in the order of its key field all that is
> necessary is to read sequentially through the index and pull out the
> corresponding record.
>
> I would suggest that the OP could do this 'manually'. i.e. build a
> separate index file with just the key fields and pointers into the
> primary file. Once that is done the operation is trivial: even more so
> if the primary file has fixed-length records (and if not I would like a
> word with the person who decided on a 44G file that must be read
> sequentially!).
>


i really do think it could be done in perl pretty easy:

my $idx;
while( <> ) {
 $idx->{ $csv->field }->{ scalar{ $idx->{ $csv->field } ) } = $.;
}

then you have a nice data structure of your values and duplicates
along with line numbers. you can then go and loop again and pull out
your lines.

i still think this is the wrong approach as it is in a db, should be
in a db and should never have been put in a 44G flat file in the first
place. but....

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Sorting an extremely LARGE file

Reply via email to