On Aug 30, 2011, at 12:56 PM, Ken Irwin wrote:
> I have a feeling it may be time for me to learn some grown-up programming
> skills, and I hope someone here might be able to help.
>
> I have a PHP script chewing over a large MySQL query. It's creating a handful
> of big associative arrays in the process, and punks out after the arrays get
> to 32MB.
> Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to
> allocate 242 bytes) in analysis.php on line 41
>
> I'm basically building a big associative array encoding the name of the
> borrowing institution, the patron type (student,faculty,staff,etc), the item
> barcode, and the various bits of data we want for each of these items.
>
> // the first time we see a barcode
> $stats[$inst][$patron_type][$item_barcode][first] = $date;
> $stats[$inst][$patron_type][$item_barcode][min] = $renewals;
> $stats[$inst][$patron_type][$item_barcode][call] = $call_no;
> //subsequent instances of the barcode
> $stats[$inst][$patron_type][$item_barcode][max] = $renewals;
>
> Once I've chewed over all 4million records (40MB) , I spit it out into a new
> MySQL table that has the collated data that I want. Unfortunately this system
> breaks down when I get to so many millions of records.
>
> Is there a more efficient way of doing this kind of data transformation? I
> *could* not keep so much in the big stats array and instead make millions of
> "UPDATE" calls to the MySQL table, but that doesn't sound like a winning
> proposition to me. I imagine that I could also increase the memory allotment,
> but it will eventually get to big too. Or I suppose that I could do it all in
> chunks - right now I'm parsing the whole raw-data SQL table at once; I could
> do one institution's data at a time and buy myself some wiggle-room.
>
> But fundamentally, it seems to me that asking PHP to hold lots of data in an
> array might not be the most programmerly system; it's just what I've always
> done.
I don't know how you're getting the data in, so I don't know how much you need
to keep in memory during the operation. I'm also not familiar with how
memory-efficient PHP is in general, so I can't recommend changes in that
regard, either.
What I can advise is two things --
When you mentioned millions of mysql 'UPDATE' calls -- that's rarely efficient.
Instead, write it out to some sort of text file (tab delim, CSV, etc), and then
use the 'LOAD DATA ... REPLACE' command:
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
For this to work, you need to have your primary keys set up correctly, as
that's what it'll use for determining if it should do an INSERT or an UPDATE.
...
If you only need a little more than 32MB to deal with (and again, I don't know
how memory efficient PHP is; it's possible that 40MB in mysql might be double
the space in PHP, or even worse), you can adjust your php.ini file to change
the memory limit:
http://www.php.net/manual/en/ini.core.php#ini.memory-limit
Of course, this is only a stop-gap measure.
...
Personally, I'd go with a mix of these two, plus chunking it out by institution
... you should be able to get a list with a 'SELECT DISTINCT', and then just
loop through 'em.
-Joe
ps. See, I made it all the way through the message without commenting that
'PHP' and 'grown-up programming' don't mix ... oh. crap. Never mind.