Hi,

could you give me some coaching in using streams ?

I would like to try your tool

and a try to read from an existing html file into a character stream - call
your tool - and save it in a new txt file:

Set file = ##class(%File).%New("test.html")
Set txtfile = ##class(%File).%New("text.txt")

Set tmpstream = ##class(%GlobalCharacterStream).%New()
set status = file.Open("RU") ; same flags as OPEN command--use "U" for
streams
if ('status) do $system.OBJ.DisplayError(status)
Do tmpstream.CopyFrom(file)
do ##class(String.Tools).HTMLToText(.tmpstream)
// do tmpstream.Rewind() ... when using this txt looks like original html
set txt = tmpstream.Read()
w txt
Do tmpstream.%Close()
Do file.%Close()

Set status = txtfile.Open("NW")
if ('status) do $system.OBJ.DisplayError(status)
set sc = txtfile.Write(txt)
if (' sc) do $system.OBJ.DisplayError(sc)
Do txtfile.Close()

>>> seems that I get the text displaed in the terminal and an empty txt file
?

thanks
Werner

"Doug Hendricks" <[EMAIL PROTECTED]> schrieb im Newsbeitrag
news:[EMAIL PROTECTED]
Guess it would help to attach the file, huh ?  :)

Doug Hendricks wrote:

>
> Until Rob's goodies are available, the attached string/stream friendly
> HTML/XML-Stripper (Cache 5.x) class might prove useful.  Use the
> HTMLToText Method to achieve what you want.  It even reformats the
> remaining text to the desired line length.
>
> Doug Hendricks
>
> Rob Tweed wrote:
>
>> Watch out for something interesting coming out soon that may help in
>> this kind of scenario - our HTML to XHTML converter.  This is a core
>> subsystem that now forms the heart of a whole bunch of techonlogies
>> and solutions I've been working on recently.  Provided your starting
>> point is HTML, albeit that it may include JavaScript, CSS, PHP script,
>> COS within <script> tags, it can all be converted to XHTML, and once
>> in that format, it can be transformed in any way you like using  DOM
>> API methods.
>>
>> Rob
>>
>> On Thu, 08 Jul 2004 23:14:59 -0400, Denver Braughler
>> <[EMAIL PROTECTED]> wrote:
>>
>>
>>>> does anybody know how to strip out html tags from a text using regexp
>>>
>>>
>>> If Cach� has regular expressions, that is good news to me.
>>>
>>>
>>>> (or anything else)
>>>
>>>
>>> The DOM parser (Robb Tweed?) might help you depending on your purpose.
>>>
>>> I use an edit utility classmethod that replaces from string1 to
>>> string2 with string3 for every occurrence.
>>> I strip everything from < to >, and replace specific &...;
>>> occurrences with a single character.
>>> You could start by stripping everything from <script> to </script>.
>>> What the PHP code does is less general.
>>>
>>> Where I have had numerous such edits, I have listed them in $text()
>>> and iterated through them at runtime.
>>
>>
>>
>> ---
>> Rob Tweed
>> M/Gateway Developments Ltd
>> Global DOMination with eXtc : http://www.mgateway.com
>> ---
>
>




----------------------------------------------------------------------------
----


> <?xml version="1.0" encoding="UTF-8"?>
> <Export generator="Cache" version="9" zv="Cache for Windows NT (Intel/P4)
5.0.7 (Build 5000U)" ts="2004-07-09 15:00:54">
> <Class name="String.Tools">
> <Abstract>1</Abstract>
> <IncludeCode>%stringreplace</IncludeCode>
> <ProcedureBlock>1</ProcedureBlock>
> <TimeChanged>59701,18360.409814</TimeChanged>
>
> <Method name="StripHTML">
> <Description>
> This method strips HTML Markup from whole or partial strings
> (such as streams) where the string may in end between markup
tags.</Description>
> <ClassMethod>1</ClassMethod>
> <FormalSpec><![CDATA[&str:%String,&Resume:%Boolean=0]]></FormalSpec>
> <ReturnType>%Status</ReturnType>
> <Implementation><![CDATA[
> s left="<", right=">"
> s l=$S(Resume:1,1:$F(str,left)),Resume=0
> While l>0 {
> s r=$F(str,right,l)
> s:(l'=0)&&(r=0) r=$L(str)+1, Resume=1
> s:l>0 $E(str,l-1,r-1)=""
> s l=$F(str,left)
> }
> Q Resume
> ]]></Implementation>
> </Method>
>
> <Method name="HTMLToText">
> <Description><![CDATA[
> Method to strip <Markup>(HTML/XML) from an input string leaving the text
values suitable for plain/text viewing or file storage.<br>
> This method also attempts to convert special escape sequences to their
native text character.<br>
> StartElem and EndElem parameter values (if specified) are exclusive (only
data between are parsed).<br>
> The CompareOp parameter (vbBinaryCompare =0, vbTextCompare =1) is only
used when evaluating the StartElem and EndElem string
searches.]]></Description>
> <ClassMethod>1</ClassMethod>
>
<FormalSpec><![CDATA[&stream:%Stream="",Compress:%Integer=0,StartElem:%Strin
g="",EndElem:%String="",CompareOp:%Integer=0]]></FormalSpec>
> <Language>basic</Language>
> <Implementation><![CDATA[
> Const MAX_LINE_LENGTH = 75
>  Dim arysplit, i, j, strOut,gt,ct, pBuff,ReadBytes
>  Dim stylestart,styleend
>
>  Dim baseFilter
>  ReadBytes=5000
>  baseFilter=Chr(0)
>  For i=1 to 31 'Non-printable character filter
>   if i <> 13 and i <> 10 then baseFilter = baseFilter & Chr(i)
>  Next
>
>  'Markup states
>  Dim inTR as Boolean
>  Dim inTD as Boolean
>  Dim inTABLE as Boolean
>  Dim inSCRIPT as Boolean
>  Dim inCOMMENT as Boolean
>  Dim inBODY as Boolean
>  Dim inSTYLE as Boolean
>  Dim inTITLE as Boolean
>  Dim inHEAD as Boolean
>  Dim inOL as Boolean
>  Dim inUL as Boolean
>  Dim inPROC as Boolean
>  Dim SUPRESS as Boolean
>  Dim PlaceInList as Integer
>  Dim cval as Integer
>  Dim chval as String
>
>  SUPRESS =False
>  inOL=False
>  inUL=False
>  inSCRIPT=False
>  inBODY=False
>  inHEAD=False
>  PlaceInList=0
>
>  pBuff=""
>  If IsObject(stream)=1 then
>   stream.Rewind()
>
> if stream.AtEnd then Return
> bytes= ReadBytes - Len(pBuff)
> pBuff=""
>   strtext=pBuff & stream.Read(bytes)
>
>  else
>   strtext=stream
>  end if
>  Do While Len(strtext)>0
>  j=0
>  if Len(StartElem) > 0 Then
>   i = Instr(1,lcase(strtext),lcase(StartElem),CompareOp)
>   If i > 0 Then strtext=Mid(strtext,i+Len(StartElem))
>  End If
>  if Len(EndElem) > 0 Then
>   i = Instr(1,lcase(strtext),lcase(EndElem),CompareOp)
>   If i > 0 Then strtext=Left(strtext,i-1)
>  End If
>
>  ';Handle Blocked <style ...></style> tags
>  styleend=-1
>   Do
>   styleend =InstrRev(LCase(strtext),"</style>",styleend,1)
>   if styleend > 0 then
>     stylestart =InstrRev(LCase(strtext),"<style",styleend,1)
>   if stylestart > 0 then
>   strtext = Left(strtext,stylestart-1) & Mid(strtext,styleend+8)
>   styleend=stylestart
>   else
>   strtext = Mid(strtext,styleend+8)
>   exit do
>   end if
>   else
>   exit do
>   End If
>  Loop
>
>  ct=Len(strtext,"<")
>   if ct > 0 then
>    arysplit = Split(pBuff & strtext, "<",-1) 'Prepend prev block remainder
>
>   For i =0 To ct-1
>
>      gt=InStr(arysplit(i), ">")
>      If inSCRIPT Then gt=0
>      tagBuff= lcase(left(arysplit(i),10))
>
>       If  gt > 0 Then
>       if Left(tagBuff,3)="!--" Or  Left(tagBuff,2)="![" Then
>       singleELEM = (Mid(arysplit(i),gt-2,2) = "--") or
(Mid(arysplit(i),gt-1,1) = "]")
>       else
> singleELEM = (Mid(arysplit(i),gt-1,1) = "/")
> end if
>
>           arysplit(i) =  Me.Filter(Trim(Mid(arysplit(i), gt + 1 )),
baseFilter)
>           if inTABLE then arysplit(i)= Replace(arysplit(i), vbCRLF,"`"  )
>           arysplit(i) = Replace( arysplit(i),  vbCrLf,"`")
>           arysplit(i) = Replace( arysplit(i), "`" ," ")
>           arysplit(i) = Replace( arysplit(i),"  ","``")
>
>           'arysplit(i) = Replace( arysplit(i),"`"," ",1,1)
>           arysplit(i) = Me.Filter( arysplit(i),"`")
>      else
>      pBuff= arysplit(i)
>      arysplit(i)=""
>      tagBuff=""
>      end if
>      if Left(tagBuff,4)="body" then
>      inBODY =Case( singleELEM,true:false,:true)
>       elseif Left(tagBuff,1)="p" then
>        if Not inTD then arysplit(i)= vbCrLf  & arysplit(i)
>       elseif Left(tagBuff,2)="/p" then
>        arysplit(i)= arysplit(i) & vbCrLf
>       elseif Left(tagBuff,1)="h" then
>
>        arysplit(i)=  vbCrLf & arysplit(i)
>       elseif Left(tagBuff,3)="img" then
>       arysplit(i)= "[Image Excluded] " & arysplit(i)
>       elseif Left(tagBuff,2)="/h" then
>
>        arysplit(i)=  vbCrLf & arysplit(i)
>      elseif Left(tagBuff,5)="/body" then
>      inBODY = false
>      elseif Left(tagBuff,2)="tr" then
>      inTR=Case( singleELEM,true:false,:true)
>
>      elseif Left(tagBuff,3)="/tr" then
>      arysplit(i)= vbCRLF & arysplit(i)
>       inTR=False
>      elseif Left(tagBuff,5)="table" then
>      inTABLE =Case( singleELEM,true:false,:true)
>      elseif Left(tagBuff,6)="/table" then
>       inTABLE=False
>      elseif  Left(tagBuff,2)="td" or Left(tagBuff,2)="th" then
>      inTD  =Case( singleELEM,true:false,:true)
>
>      elseif  Left(tagBuff,3)="/td" or Left(tagBuff,3)="/th" then
>      if inTD then arysplit(i)= Me.Filter(arysplit(i), vbCRLF) & vbTab
>      inTD = False
>      elseif  Left(tagBuff,6)="script" and Not inSCRIPT then
>      inSCRIPT  =Case( singleELEM,true:false,:true)
>      elseif  Left(tagBuff ,7)="/script" then
>      inSCRIPT = False
>      elseif Left(tagBuff,5)="style" then
>      inSTYLE  =Case( singleELEM,true:false,:true)
>      elseif Left(tagBuff,6)="/style" then
>      inSTYLE = False
>      elseif Left(tagBuff,3)="!--" then
>       inCOMMENT  =Case( singleELEM,true:false,:true)
>      elseif Left(tagBuff,5)="title" then
>      inTITLE  =Case( singleELEM,true:false,:true)
>      elseif Left(tagBuff,6)="/title" then
>      inTITLE = False
>      elseif Left(tagBuff,4)="head" then
>      inHEAD  =Case( singleELEM,true:false,:true)
>      elseif Left(tagBuff,5)="/head" then
>      inHEAD = False
>        elseif Left(tagBuff,2)="br" then
>        arysplit(i)= vbCrLf & arysplit(i)
>        elseif Left(tagBuff,2)="ol" then
>          inOL  =Case( singleELEM,true:false,:true)
>          arysplit(i)= vbCrLf & vbCrLf & arysplit(i)
>          PlaceInList=0
>       elseif Left(tagBuff,3)="/ol" then
>         inOL = False
>         PlaceInList=0
>        elseif Left(tagBuff,2)="ul" then
>          inUL  =Case( singleELEM,true:false,:true)
>          arysplit(i)= vbCrLf & vbCrLf & arysplit(i)
>
>       elseif Left(tagBuff,3)="/ul" then
>         inUL = False
>        elseif Left(tagBuff,2)="hr" then
>          if Not Compress then arysplit(i)= vbCrLf & String(70,45) & vbCrLf
& arysplit(i)
>       elseif Left(tagBuff,2)="li" then
>              If inOL then PlaceInList = PlaceInList + 1
>        arysplit(i)= vbCrLf &  Case(inOL, true: PlaceInList & ". ",:
Chr(149) & " ") & arysplit(i)
>       elseif Left(tagBuff,3)="/li" then
>        arysplit(i)= vbCrLf & arysplit(i)
>       else
>
>       End if
>       if inBODY and Not inSCRIPT and Not inCOMMENT _
>       and Not inSTYLE then
>         Do
>      tesc1 =instr(arysplit(i),"&#")
>      if tesc1 > 0 then
>      tesc2 =instr(tesc1,arysplit(i),";")
>      if tesc2 > tesc1 then
>      cval =+Mid(arysplit(i),tesc1+2,tesc2-tesc1-2)
>
>      if Abs(cval) <= 255 Then
>      chval = Chr(cval)
>      else
>      chval = Chr(151)
>      end if
>      arysplit(i)=Left(arysplit(i),tesc1-1) & chval &
Mid(arysplit(i),tesc2+1)
>      else
>      exit do
>      end if
>      else
>      exit do
>      end if
>      Loop
>
>
>      End If
>
>   Next
>  End If
>  strtext = Join(arysplit)
>
>   EraseArray arysplit
>
> If Compress > 0 then
>  strtext=Me.Compress(strtext)
> else
>  strtext=Me.SetLineBoundary(strtext,MAX_LINE_LENGTH,vbCrLf) & vbCrLf
> End if
>
>  Print "%CSP.Page".UnescapeHTML(strtext)
>  strtext=""
>  If IsObject(stream)=1 then
> if stream.AtEnd then Exit Do
> bytes= ReadBytes - Len(pBuff)
>
>   strtext=pBuff & stream.Read( bytes)
>
>   pBuff=""
>  End If
>  Loop
>  Return
> ]]></Implementation>
> </Method>
>
> <Method name="SetLineBoundary">
> <ClassMethod>1</ClassMethod>
> <FormalSpec>strIn:%String,Length:%Integer=80,Delim:%String</FormalSpec>
> <Private>1</Private>
> <ReturnType>%String</ReturnType>
> <Implementation><![CDATA[
> s out="",crlf=$C(13,10),i=0,words=0
> s thisword=0
> q:strIn="" strIn
>
> For i=1:1:$length(strIn,crlf) {
> s line=$ZStrip($P(strIn,crlf,i),"<>W")
>
> if ($Length(line)'>Length) {
> s out=out_line_crlf
>
> } else {
> s wordct=$Length(line," ")
> s newline=""
> For thisword=1:1:wordct {
>
> s word=$P(line," ",thisword)_" "
>
> s:$Length(word_newline)>Length out=out_newline_crlf,newline=""
> s newline=newline_word }
>   s out=out_newline_crlf
>
> }
>
> }
>
> quit out
> ]]></Implementation>
> </Method>
>
> <Method name="QPDecode">
> <Description>
> Decode a Quoted-Printable String</Description>
> <ClassMethod>1</ClassMethod>
> <FormalSpec><![CDATA[&input:%String]]></FormalSpec>
> <ReturnType>%String</ReturnType>
> <Implementation><![CDATA[
>
>  set $zt="dce"
>
>  set in=$tr(input,"_",$c(32))
>  set text=$piece(in,"=")
>  for k=2:1:$length(in,"=") {
>  set p=$piece(in,"=",k)
>  set h=$extract(p,1,2)
>  if ($length(h)=2)&&($tr(h,"0123456789abcdefABCDEF")="") {
>   set text=text_$char($zhex(h))_$extract(p,3,$length(p))
>  } else {
>   set text=text_"="_p
>  }
>  }
>
>  s input=text
>  quit 1
> dce
>  s $zt=""
>
>  q 0
> ]]></Implementation>
> </Method>
>
> <Method name="Replace">
> <ClassMethod>1</ClassMethod>
> <FormalSpec>ins:%String="",find:%String="",with:%String=""</FormalSpec>
> <Language>basic</Language>
> <ReturnType>%String</ReturnType>
> <Implementation><![CDATA[
> Dim nstring as String
> nstring = Replace( ins, find, with)
> Return nstring
> ]]></Implementation>
> </Method>
>
> <Method name="Compress">
> <ClassMethod>1</ClassMethod>
> <FormalSpec>str:%String,mode:%Integer=0</FormalSpec>
> <Language>basic</Language>
> <ReturnType>%String</ReturnType>
> <Implementation><![CDATA[
> str=Replace(str, "  ", "``") 'remove dbl spaces
>       Return Me.Filter(str,  vbCrLf & vbTab & "`" &Chr(160))
> ]]></Implementation>
> </Method>
>
> <Method name="Filter">
> <ClassMethod>1</ClassMethod>
>
<FormalSpec>strtext:%String,filter:%String,replwith:%String=""</FormalSpec>
> <Language>cache</Language>
> <ReturnType>%String</ReturnType>
> <Implementation><![CDATA[ q $TRANSLATE(strtext,filter,replwith)
> ]]></Implementation>
> </Method>
> </Class>
> <Checksum value="3725873425"/>
> </Export>
>



Reply via email to