Re: Advise in optimizing an application

Michael Grant Sat, 15 Nov 2014 09:08:06 -0800

Do you have indexes on ml_email and groups_id columns? I would likely
create an index that has both in it.

There's a couple of suggestions about changing your architecture a bit:

First, from the looks of it I think you could probably combine both these
queries into one with a left join vw_mailLIST and then evaluate the outcome
to determine if you delete the item from your array. As long as the items
joined on are PKs or have an index it should be faster than running two
individual queries. Without knowing the actual table structure or data I
can't be sure but off-hand I'd think the following would be much quicker.
(Note that since I can't test there may need to be some tweaking.)

SELECT mlu.groups_id as mlu_gid, mlu.dateRESUB, vw.groups_ID as vw_gid
FROM   maillist_unsubscribe mlu

LEFT
JOIN   vw_mailLIST vw
ON     mlu.groups_id = vw.groups_id

WHERE  mlu.ml_email = vw.ml_email
AND    mlu.ml_email = <cfqueryparam value="#mailREC[i][1]#" cfsqltype=
"CF_SQL_VARCHAR">
AND    mlu.groups_id = <cfqueryparam value="#req.thisGROUPID#" cfsqltype=
"CF_SQL_INTEGER" />

You can then evaluate if dateRESUB is NULL, if mlu_gid and/or vw_gid are
null or if there's recordCount at all to determine whether you remove the
item from the the array. You can also have your if statement about the
dupFILTER there as well. This puts everything into a single loop with 50%
less queries.

Second, I would consider doing things a bit differently and instead of
looping over your array with individual queries it should be much faster if
you run a single query and then loop over the query result to determine if
you update the array. So create a comma separated list of email addy's from
your mailREC array and use it in an IN statement on mlu.ml_email. IN
statements aren't super efficient, but if you have an index on ml_email it
should still be much faster than 15,000 individual queries. Then loop over
the result set and use ArrayFind to see if the result is in the array and
if it is, delete it. The result set should only contain the results you're
looking to delete if I'm understanding your queries correctly. This
approach would be WAY faster and much more scalable since looping over
15,000 or even 100,000 results in memory is way quicker than making that
many database queries.

Hopefully you find this useful. Without having your database to query
against it can be a bit tough to determine if the code I wrote works. I do
think the architecture logic in approach two is worth the time to refactor
though.

Have a great day.

Mike

On Sat, Nov 15, 2014 at 8:44 AM, Les Mizzell <[email protected]> wrote:

>
>  > I'm still fighting with my Excel Import application a
>  > bit. Basically, it works pretty well, as long as
>  > there's a limited number of records being imported.
>
> Was testing last night with the 15,000 record Excel sheet.
> If I comment out the two filter queries in the code, it will read in the
> sheet and a cfdump of the array shows everything was imported properly.
> Didn't time it, but it only took a few minutes. Adding those two queries
> back in and trying to import the entire sheet - the server choked and I had
> to call support to reboot it. Whoops.
>
> It's the two filter queries below that are slowing the whole thing down
> and choking the server. The one optimization I can think of is the look for
> duplicates query. Instead of running that against the entire database
> looking for matches (several hundred thousand records last time I checked),
> I should do a query of queries and run the loop ONLY against records for
> that one group, which I'd prefetch before starting the loop. That should
> help a lot. Working on that now...
>
> One thought I had - I don't care how long the import of a large sheet
> takes. It just can't choke the server. Would there be any benefit of
> putting in a DELAY for every 500 records (or so) in the loop to give the
> server a "rest" to do other stuff?
>
> <cfif i MOD 500 EQ 0>
>     <cfset sleep(5000) />
> </cfif>
>
>
>
> FILTER QUERIES
> <!--- Before we can save the list, we need to run two checks  --->
> <!--- First, is there an email address in this list that has previously
> unsubscribed? --->
> <cfoutput>
>   <cfloop from="#arrayLen(mailREC)#" to="1" step="-1" index="i">
>       <!--- delete the row if it was an invalid address. Got to be a way
> to fix this but --->
>       <!--- the above "data" output block wants to add an invalide row if
> you try to skip the entire row when the address is invalid --->
>      <cfif mailREC[i][1] eq "-">
>         <cfset ArrayDeleteAt(mailREC, #i#)>
>      <cfelse>
>       <!--- OK, it's a valid address. Have they unsubscribed before? --->
>          <cfquery name="unsubbed" datasource="#req.datasource#"
> username="#username#" password="#password#">
>              SELECT groups_id FROM maillist_unsubscribe
>              WHERE dateRESUB is NULL
>              AND ml_email = <cfqueryparam value="#mailREC[i][1]#"
> cfsqltype="CF_SQL_VARCHAR">
>              AND groups_id = <cfqueryparam value="#req.thisGROUPID#"
> cfsqltype="CF_SQL_INTEGER" />
>          </cfquery>
>        <cfif unsubbed.recordcount EQ 1><cfset ArrayDeleteAt(mailREC,
> #i#)></cfif>
>      </cfif>
>     </cfloop>
>
>    <!--- if Admin has checked the duplicate filter box on the form, but
> sure nothing incoming is a duplicate already in the database for this group
> --->
>    <cfif form.dupFILTER EQ "on">
>     <cfloop from="#arrayLen(mailREC)#" to="1" step="-1" index="i">
>          <cfquery name="ckDUP" datasource = "#req.datasource#"
> username="#username#" password="#password#">
>              Select groups_id from vw_mailLIST
>              WHERE groups_id = <cfqueryparam
> value="#trim(req.thisGROUPID)#" cfsqltype="CF_SQL_INTEGER"  />
>              AND ml_email = <cfqueryparam cfsqltype="cf_sql_varchar"
> value="#mailREC[i][1]#" />
>          </cfquery>
>          <cfif ckDUP.recordcount EQ 1><cfset ArrayDeleteAt(mailREC,
> #i#)></cfif>
>     </cfloop>
>    </cfif>
> </cfoutput>
>
>
>
>
>
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Order the Adobe Coldfusion Anthology now!
http://www.amazon.com/Adobe-Coldfusion-Anthology/dp/1430272155/?tag=houseoffusion
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:359648
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/groups/cf-talk/unsubscribe.cfm

Re: Advise in optimizing an application

Reply via email to