I wrote a quick script (30 lines or so) and it works great (ran on 5000+ rows - about 4-6 words per row, in under a second).
You need to change the 3 areas below with your database specific information. You will also need to have two custom functions (listGetAt and listLen) - I've created my own version of these for SQL for things like this in the past. I'm including them at the bottom as well - they are great to have in your DB anyway!! =)
-Nate
this should work for MS SQL or Oracle :
------------------------------------------------------------------------------------------------------- set nocount on;
create table #wordlist (word varchar(100))
declare @listItem varchar(8000), @listIndex int
declare listItemsCursor cursor for
-------CHANGE BELOW:query to get the column with the words you want to count :
SELECT display_name from categories
open listItemsCursor
fetch next from listItemsCursor into @listItem
while @@FETCH_STATUS = 0 begin set @listIndex = 1
-------CHANGE BELOW:specify database name below where it says 'dbname.dbo.' etc
while @listIndex <= dbname.dbo.listLen(@listItem, ' ')
begin
-------CHANGE BELOW:specify database name below where it says 'dbname.dbo.' etc
insert into #wordlist (word) values (dbname.dbo.listGetAt(@listItem, @listIndex,' '))
set @listIndex = @listIndex + 1
end
fetch next from listItemsCursor into @listItem
end
close listItemsCursor deallocate listItemsCursor
set nocount off;
select count(word), word from #wordlist group by word order by count(word) desc
drop table #wordlist
---------------------------------------------------------------------------------------------------
Here are the two user defined functions you'll need to run before the above will work
-------------------------------------------------------------------------------------------------- SET QUOTED_IDENTIFIER ON GO SET ANSI_NULLS ON GO
CREATE FUNCTION listGetAt
(
@list as varchar(8000),
@indexToGet as int,
@delim as varchar(1)
)
RETURNS varchar(8000)
AS
BEGIN
declare @listLen int, @parseTemp varchar(8000), @charStart int, @charEnd int,
@element varchar(8000), @doubleDelim varchar(2),
@foundit int
set @list = rtrim(ltrim(@list))
set @foundit = 0
set @listLen = 0
--set @indexToGet = @indexToGet
set @parseTemp = @list
set @doubleDelim = @delim + @delim
-- remove duplicate commas / empty items
set @parseTemp = replace(@parseTemp,@doubleDelim,@delim)
set @parseTemp = replace(@parseTemp,@doubleDelim,@delim)
set @indexToGet = @indexToGet - 1 while (len(@parseTemp) > 0 and @listLen <= 10 and @foundit = 0) begin set @charStart = charindex(@delim,@parseTemp) set @charEnd = charindex(@delim,@parseTemp,@charStart)
if @charEnd > 0
begin
if (@listLen) = @indexToGet
begin
-- this is the one
set @element = (substring(@parseTemp,1,(@charEnd - 1)))
set @foundit = 1
end if(len(substring(@parseTemp,(@charStart - 1),(@charEnd - 1)))) > 0
begin
set @listLen = @listLen + 1
end
set @parseTemp = right(@parseTemp,(len(@parseTemp) - @charEnd)) end
else
-- its the last element in the list
begin
set @element = @parseTemp
set @foundit = 1
set @listLen = @listLen + 1
endend
return (@element) END
GO SET QUOTED_IDENTIFIER OFF GO SET ANSI_NULLS ON GO -------------------------------------------------------------------------------------------- SET QUOTED_IDENTIFIER ON GO SET ANSI_NULLS OFF GO
CREATE FUNCTION listLen (@list varchar(8000), @delim varchar(1))
RETURNS int AS
BEGIN
declare @listLen int, @parseTemp varchar(8000), @charStart int, @charEnd int, @element varchar(8000)
set @listLen = 0
set @parseTemp = ltrim(rtrim(@list))
while len(@parseTemp) > 0 and @listLen <= 1000
begin
set @charStart = charindex(@delim,@parseTemp)
set @charEnd = charindex(@delim,@parseTemp,@charStart)
if @charEnd > 0
begin
--set @element = substring(@parseTemp,(@charStart - 1),(@charEnd - 1))
--print @element
if(len(substring(@parseTemp,(@charStart - 1),(@charEnd - 1)))) > 0
begin
set @listLen = @listLen + 1
end
--print @parseTemp
set @parseTemp = right(@parseTemp,(len(@parseTemp) - @charEnd))
--print @charEnd end
else
begin
if len(@charEnd) > 0
begin
set @listLen = @listLen + 1
end
set @parseTemp = ''
end
--print @listLen
end
return(@listLen)
ENDGO SET QUOTED_IDENTIFIER OFF GO SET ANSI_NULLS ON GO ---------------------------------------------------------------------------------
----- Original Message ----- From: "C. Hatton Humphrey" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, April 06, 2005 8:26 AM
Subject: Re: Here's a new one...
I want to generated a list of, say the top 10 words used in entries in a particular column of the database.
Interestingly, Ron and Matt both posted the solution that I decided to use as my "morning coding challenge"
I have the code written and will send to anyone interested - consider it an open-source gift to the community. I'm posting it on CFLib.org as I write this but if anyone wants the code directly I'll send it to them.
Hatton
-- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.9.3 - Release Date: 4/5/2005
---------------------------------------------------------- To post, send email to [email protected] To unsubscribe: http://www.dfwcfug.org/form_MemberUnsubscribe.cfm To subscribe: http://www.dfwcfug.org/form_MemberRegistration.cfm
----------------------------------------------------------
To post, send email to [email protected]
To unsubscribe: http://www.dfwcfug.org/form_MemberUnsubscribe.cfm
To subscribe: http://www.dfwcfug.org/form_MemberRegistration.cfm
