Le 18 janv. 07, à 14:47, Chuck Pelto a écrit :

Is anyone familiar with a way to get all the words out of a string as succinct elements?

Regards,

Chuck

If by 'succinct elements' you mean words separated by spaces, the split function will do the work. But if you're working on a text, the words will often be separated by dots, coma, and other delimiters. You would need to run the split function too many times to separate sentences (delimited with dots) into sub sentences (delimited by comas) and then into words (delimited by spaces). I couldn't find any plug-in to get the words out of a text in a straightforward and easy way. (Personal note : I'm interrested in lexicography (statistics applied to texts), if anyone shares the same interest, please let me know if you know any RB plug-ins or any application (free and running on a Mac) for that kind of work.)
So if that's what you're looking for, check this code below :

' 1) s is the string with the text you want to extract the words from. The do...loop replace double spaces by single spaces (It seems that when you paste a text into an editfield, some spaces or carriage returns are added). To do that, we need to work with a temporary string : st.

  dim st as String
  do
    st = s
    s=ReplaceAll(s,"  "," ")
  loop until st=s
  s=LTrim(s)
  s=RTrim(s)

'2) nbChar is the number of chars of the text. It will be useful later.

  dim nbChar as Integer
  nbChar=s.len()


'3) We extract each word and store it in the array aListeMots, that needs to be declared first (the french for aListWords, if you wonder). aListeMotsC1 and aListeMotsCd are two others arrays that store the position of the first and last char of each word stored in aListeMots. Maybe you don't need these informations ; in that case, some lines of code can be deleted. And if you do, you can probably use a single multidimensional array instead of three, but I wasn't sure how to do that. 'The string separateurs ('delimiters') is a list of chars that should be considered as blank spaces, in as much as they separate words. It includes rc, the RB name for return carriage.


  dim separateurs as string
  dim rc as string
  rc=EndOfLine.Macintosh
  separateurs=",.! ?'¡¿:;<>()"+rc
  dim i,j,c1,cd,n,nlleLigne as integer
  dim vChar,vChar2 as string

  redim aListeMots (-1)
  redim aListeMotsC1 (-1)
  redim aListeMotsCd (-1)


  i=1
  c1=0
  cd=0

'The following loops reads each char, checks if it's a delimiter, and if it is, fills aListeMots, aListeMotsC1 and aListeMotsCd with the word, and the positions of its first and last character. A second loop is within the first one because, when a delimiter is found before a word, we then need to find the delimiter after this word.

  do until i>nbChar
    vChar = mid(s,i,1)
    if inStr(separateurs, vChar)=0 then
      c1=i
      j=i+1
      vChar2 = mid(s,j,1)
      do until j>nbChar or inStr(separateurs,vChar2)>0
        j=j+1
        vChar2 = mid(s,j,1)
      loop
      cd=j-1
      aListeMots.append mid(s,c1,(cd-c1+1))
      aListeMotsC1.append c1
      aListeMotsCd.append cd
      i=j
    else
      i=i+1
    end if
  loop

'The work is done. A famous sentence would give three arrays :

array aListeMots :
The
quick
brown
fox
jumps
over
the
lazy
dog

array aListeMotsC1
1
5
11
17
21
27
32
36
41

array aListeMotsCd
3
9
15
19
25
30
34
39
43

Hope this helps.
Regards,

Octave
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to