here is a custom tag i wrote a while back to do just what you want i believe
if anyone wants to workaround the evalute()'s feel free and let me see : )

someone i know had a huge DB table full of emails with duplicates and this
is what i came up with to help... then i said set all email fiields to NO
DUPLICATES when you make the table : )


<!--- {REQUIRED) The data source to use in the queries --->
<CFPARAM NAME="ATTRIBUTES.DataSource" DEFAULT="" TYPE="STRING">

<!--- {REQUIRED}   The table name to query for duplicates --->
<CFPARAM NAME="ATTRIBUTES.TableName" DEFAULT="" TYPE="STRING">

<!--- {REQUIRED} The primary key of the table defined in
ATTRIBUTES.tbl_name" --->
<CFPARAM NAME="ATTRIBUTES.PrimaryKey" DEFAULT="" TYPE="STRING">

<!--- {REQUIRED} The name of the column to check duplicates for --->
<CFPARAM NAME="ATTRIBUTES.Column" DEFAULT="" TYPE="STRING">

<!--- {OPTIONAL} To NOT display output to the screen (OF DUPLICATES ONLY)
set this to NO --->
<CFPARAM NAME="ATTRIBUTES.DisplayOutput" DEFAULT="Yes" TYPE="STRING">

<!--- {OPTIONAL} Set Mode to "ViewOnly" to see the duplicates and
information that will be deleted Without Deleting anything --->
<CFPARAM NAME="ATTRIBUTES.Mode" DEFAULT="Process" TYPE="STRING">



<!---***********************************************// ERROR CHECKS
//*******************************************************--->

     <!--- If "ATTRIBUTES.Column is undefined, inform the user that it is
required --->
  <cfif ATTRIBUTES.Column EQ "">
   <cfthrow message="The attribute ''<b>Column</b>'' is required. It is the
column you want to check duplicates for.">
  </cfif>

     <!--- If "ATTRIBUTES.DisplayOutput is not equal to Yes or No, inform
the user that these are the only acceptable values --->
  <cfif ATTRIBUTES.DisplayOutput NEQ "YES" AND ATTRIBUTES.DisplayOutput NEQ
"No">
   <cfthrow message="''Yes'' and ''No'' are the only acceptable values for
''<b>DisplayOutput</b>''. Default is Yes">
  </cfif>

  <!--- If ATTRIBUTES.Mode is not equal to "Process" or "ViewOnly", inform
the user that these are the only acceptable values --->
  <cfif ATTRIBUTES.Mode NEQ "Process" AND ATTRIBUTES.Mode NEQ "ViewOnly">
   <cfthrow message="''Process'' and ''ViewOnly'' are the only acceptable
values for ''<b>Mode</b>''. Default is Process">
  </cfif>

     <!--- If ATTRIBUTES.DataSource was not defined, inform the user --->
     <cfif ATTRIBUTES.DataSource IS "">
   <cfthrow message="The attribute, ''<b>DataSource</b>'' is required and
must be a valid datasource.">
  </cfif>

     <!--- If  ATTRIBUTES.TableName was not defined, inform the user --->
     <cfif ATTRIBUTES.TableName IS "">
   <cfthrow message="The attribute, ''<b>TableName</b>'' is required and
must be a valid table name in the datasource defined in
''<b>DataSource</b>''.">
  </cfif>

     <!--- If  ATTRIBUTES.PrimaryKey was not defined, inform the user --->
     <cfif ATTRIBUTES.PrimaryKey IS "">
   <cfthrow message="The attribute, ''<b>PrimaryKey</b>'' is required and
must be a the tables Unique Identifier.">
  </cfif>

  <!--- If ATTRIBUTES.Mode is ViewOnly but ATTRIBUTES.DisplayOutput is No,
Inform the user that this accomplishes nothing --->
     <cfif ATTRIBUTES.Mode IS "ViewOnly" AND ATTRIBUTES.DisplayOutput NEQ
"YES">
   <cfthrow message="Displaying <b>No</b> output in ''<b>ViewOnly</b>'' mode
will not accomplish anything.">
  </cfif>

<!---*******************************************// END ERROR CHECKS
//******************************************************--->

<!--- The Final list of Unique identifiers that will be deleted from the
database will be stored in "FinalList"--->
<CFPARAM NAME="FinalList" DEFAULT="">

<!--- A temp list to hold the UID's of the current item's duplicates --->
<CFPARAM NAME="TempList" DEFAULT="">

<!--- Get ALL specified fields and Primary keys to loop over, checking for
duplicates --->
<cfquery name="GetAll" datasource="#ATTRIBUTES.DataSource#"
blockfactor="100">
 SELECT #ATTRIBUTES.PrimaryKey#, #ATTRIBUTES.Column#
 FROM #ATTRIBUTES.TableName#
 ORDER by #ATTRIBUTES.Column#
</cfquery>

<!--- Select UNIQUE Entries to compare against the first query to get a
count of extras (This is used for display only)--->
<cfquery name="GetDistinct" datasource="#ATTRIBUTES.DataSource#"
blockfactor="50">
 SELECT DISTINCT #ATTRIBUTES.Column#
 FROM #ATTRIBUTES.TableName#
 ORDER BY #ATTRIBUTES.Column#
</cfquery>


<!--- If displayOutput is YES output a list of duplicates (if any) --->
<cfif ATTRIBUTES.DisplayOutput IS "YES">

<!--- Extra ID's will be the first queries recordcount minus the second
queries recordcount --->
<cfset extras = GetAll.RecordCount - GetDistinct.RecordCount>

<cfoutput>
Before Purge Of Duplicates:<br>
------------------------------------------------------<br>
<b>#GetAll.RecordCount#</b> entries in the database.
<b>#GetDistinct.RecordCount#</b> are unique <i>(<b>#extras#</b>
extras)</i><br>
------------------------------------------------------<br><br>
<cfif GetAll.RecordCount EQ GetDistinct.RecordCount>
 <b>There were no duplicates in the database. No data will be deleted.</b>
</cfif>

</cfoutput>

</cfif>

<!--- loop through grouping by the field chosen to check duplicates for --->
<cfoutput query="GetAll" group="#ATTRIBUTES.Column#">

   <!--- get duplicate emails --->
   <cfquery name="GetDups" datasource="#ATTRIBUTES.DataSource#"
blockfactor="50">
 SELECT #ATTRIBUTES.PrimaryKey#, #ATTRIBUTES.Column#
 FROM #ATTRIBUTES.TableName#
 WHERE #ATTRIBUTES.Column# = <cfif NOT
IsNumeric(ATTRIBUTES.Column)>'</cfif>#evaluate(ATTRIBUTES.COLUMN)#<cfif NOT
IsNumeric(ATTRIBUTES.Column)>'</cfif>
   </cfquery>

   <!--- if the Information is entered more than once --->
   <cfif GetDups.RecordCount GT 1>

   <!--- Set all of the Uniquie identifiers, that are associated with the
current return, to a temporary list --->
   <cfset TempList = Evaluate("ValueList(GetDups.#Attributes.PrimaryKey#)")>

   <cfif ATTRIBUTES.DisplayOutput IS "YES">
    <b>#Evaluate(ATTRIBUTES.Column)#</b> is in the database
<b>#getdups.recordcount#</b> times<br><br>
   </cfif>

   <!--- Remove ONE of the current Itmes PrimaryKeys from the list, so we
keep ONE of their entries in the database when we loop over the list to
delete --->
   <cfset TempList = ListDeleteAt(TempList, 1, ",")>
   </cfif>

   <!--- Append the temp list we made during THIS loop to the larger
collection of Primary Keys if it was not empty--->
   <cfif templist NEQ "">
    <cfset FinalList = listappend(FinalList, TempList, ",")>

   <!--- reset the temp list so we can start over --->
   <cfset templist="">

   </cfif>

</cfoutput>

  <cfif ATTRIBUTES.DisplayOutput IS "YES" AND len(FinalList)>
   <cfoutput>
     <br>
     <b>The Following list is a list of Primary Keys that will need to be
deleted to rid the table of duplicate entries</b>
     <br><br>
     #FinalList#
   </cfoutput>
  </cfif>

<!--- If ATTRIBUTES.Mode is Process, then delete all entries with A primary
key that is found in our final list --->
<cfif ATTRIBUTES.Mode IS "Process">

 <!--- I did not use "IN(List)" so each delete query could be seen if debug
output is on --->
  <cfloop list="#FinalList#" index="i">
   <cfquery name="DeleteEntries" datasource="#ATTRIBUTES.DataSource#">
    DELETE
    FROM #ATTRIBUTES.TableName#
    WHERE #ATTRIBUTES.PrimaryKey# = <cfif NOT IsNumeric(I)>'</cfif>#i#<cfif
NOT IsNumeric(I)>'</cfif>
   </cfquery>
  </cfloop>

</cfif>






"Paul Hastings" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> > Hi list, with the following script (below) I am
> > checking duplicate eMails and I delete
> > them after checking them.
>
> you don't say what db BUT if its sql server (or another db with similar
> index functionality), i'd just let the db do the hard work:
> create another table duplicating your original table but put a unique
index
> with ignore duplicate key property on your email column.
>
> CREATE UNIQUE INDEX emailIDX
> ON blabla(email)
>    WITH IGNORE_DUP_KEY
>
> then simply INSERT/SELECT from the old table into the new one. sql server
> will accept the first email, but throw away (ignore) all the duplicate
ones.
>
>
>
>
> 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4
Subscription: 
http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribe&forumid=4
FAQ: http://www.thenetprofits.co.uk/coldfusion/faq

Your ad could be here. Monies from ads go to support these lists and provide more 
resources for the community. 
http://www.fusionauthority.com/ads.cfm

                                Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4
                                

Reply via email to