The best way to do this off the top of my head would be to use some clever t-sql to take care of it DB-side.

I wrote a quick script (30 lines or so) and it works great (ran on 5000+ rows - about 4-6 words per row, in under a second).

You need to change the 3 areas below with your database specific information. You will also need to have two custom functions (listGetAt and listLen) - I've created my own version of these for SQL for things like this in the past. I'm including them at the bottom as well - they are great to have in your DB anyway!! =)

-Nate


this should work for MS SQL or Oracle :

-------------------------------------------------------------------------------------------------------
set nocount on;

create table #wordlist (word varchar(100))

declare @listItem varchar(8000), @listIndex int

declare listItemsCursor cursor for

-------CHANGE BELOW:query to get the column with the words you want to count :
SELECT display_name from categories
open listItemsCursor


fetch next from listItemsCursor into @listItem

while @@FETCH_STATUS = 0
begin
set @listIndex = 1

-------CHANGE BELOW:specify database name below where it says 'dbname.dbo.' etc
while @listIndex <= dbname.dbo.listLen(@listItem, ' ')
begin
-------CHANGE BELOW:specify database name below where it says 'dbname.dbo.' etc
insert into #wordlist (word) values (dbname.dbo.listGetAt(@listItem, @listIndex,' '))
set @listIndex = @listIndex + 1
end
fetch next from listItemsCursor into @listItem
end


close listItemsCursor
deallocate listItemsCursor

set nocount off;

select count(word), word from #wordlist group by word order by count(word) desc

drop table #wordlist

---------------------------------------------------------------------------------------------------


Here are the two user defined functions you'll need to run before the above will work



-------------------------------------------------------------------------------------------------- SET QUOTED_IDENTIFIER ON GO SET ANSI_NULLS ON GO



CREATE FUNCTION listGetAt
(
@list as varchar(8000),
@indexToGet as int,
@delim as varchar(1)
)
RETURNS varchar(8000)
AS
BEGIN
declare @listLen int, @parseTemp varchar(8000), @charStart int, @charEnd int,
@element varchar(8000), @doubleDelim varchar(2),
@foundit int
set @list = rtrim(ltrim(@list))
set @foundit = 0
set @listLen = 0
--set @indexToGet = @indexToGet
set @parseTemp = @list
set @doubleDelim = @delim + @delim
-- remove duplicate commas / empty items
set @parseTemp = replace(@parseTemp,@doubleDelim,@delim)
set @parseTemp = replace(@parseTemp,@doubleDelim,@delim)


set @indexToGet = @indexToGet - 1
while (len(@parseTemp) > 0 and @listLen <= 10 and @foundit = 0)
begin
 set @charStart = charindex(@delim,@parseTemp)
 set @charEnd = charindex(@delim,@parseTemp,@charStart)

  if @charEnd > 0
   begin
    if (@listLen) = @indexToGet
    begin
     -- this is the one
     set @element = (substring(@parseTemp,1,(@charEnd - 1)))
     set @foundit = 1
    end

    if(len(substring(@parseTemp,(@charStart - 1),(@charEnd - 1)))) > 0
    begin
     set @listLen = @listLen + 1
    end
    set @parseTemp = right(@parseTemp,(len(@parseTemp) - @charEnd))

   end
  else
   -- its the last element in the list
   begin
     set @element =  @parseTemp
     set @foundit = 1
     set @listLen = @listLen + 1
   end

end

return (@element)
END



GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO
--------------------------------------------------------------------------------------------
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_NULLS OFF
GO



CREATE FUNCTION listLen (@list varchar(8000), @delim varchar(1))
RETURNS int AS
BEGIN
declare @listLen int, @parseTemp varchar(8000), @charStart int, @charEnd int, @element varchar(8000)
set @listLen = 0
set @parseTemp = ltrim(rtrim(@list))


while len(@parseTemp) > 0 and @listLen <= 1000
begin
 set @charStart = charindex(@delim,@parseTemp)
 set @charEnd = charindex(@delim,@parseTemp,@charStart)
 if @charEnd > 0
  begin
   --set @element = substring(@parseTemp,(@charStart - 1),(@charEnd - 1))
   --print @element
   if(len(substring(@parseTemp,(@charStart - 1),(@charEnd - 1)))) > 0
   begin
    set @listLen = @listLen + 1
   end
    --print @parseTemp
    set @parseTemp = right(@parseTemp,(len(@parseTemp) - @charEnd))
    --print @charEnd

  end
 else
  begin
   if len(@charEnd) > 0
    begin
     set @listLen = @listLen + 1
    end
   set @parseTemp = ''
  end
 --print @listLen
end
return(@listLen)
END



GO
SET QUOTED_IDENTIFIER OFF
GO
SET ANSI_NULLS ON
GO
---------------------------------------------------------------------------------






----- Original Message ----- From: "C. Hatton Humphrey" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, April 06, 2005 8:26 AM
Subject: Re: Here's a new one...



I want to generated a list of, say the top 10 words used in entries in a particular column of the database.

Interestingly, Ron and Matt both posted the solution that I decided to use as my "morning coding challenge"


I have the code written and will send to anyone interested - consider it an open-source gift to the community. I'm posting it on CFLib.org as I write this but if anyone wants the code directly I'll send it to them.

Hatton


-- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.9.3 - Release Date: 4/5/2005

----------------------------------------------------------
To post, send email to [email protected]
To unsubscribe: http://www.dfwcfug.org/form_MemberUnsubscribe.cfm
To subscribe: http://www.dfwcfug.org/form_MemberRegistration.cfm



----------------------------------------------------------
To post, send email to [email protected]
To unsubscribe: http://www.dfwcfug.org/form_MemberUnsubscribe.cfm
To subscribe: http://www.dfwcfug.org/form_MemberRegistration.cfm





Reply via email to