Bob Dohse wrote:

. > Glenn,

. > Why not parse for the elements contained within the brackets?

. > Search for the leading bracket ... 
. > - if followed by a legal element (e.g., "<img src"), then ...
. > - replace everything within the quotes (i.e., the folder and file name)
. > with the Arachne assigned name
. > - the result would be an acceptable Arachne reference valid for offline
. > use

. > Bob ~

Bob,

There is no need to parse the page source as the pairings are all included in 
CACHE.IDX.  I finally got a text version of CACHE.IDX by using the -c option 
of WWWMAN to create CACHEIDX.HTM and then using HTMSTRIP to get a pure text 
version so that I would not have to copy full source page reference paths 
into an e-mail message.

The CACHE.IDX is that of my home page, MY.YAHOO.  The first few lines of 
CACHE.IDX are:  (My comments will be all upper case for distinction.)

                                  Cache index
key...">Index of Arachne WWW cache
------------------------------------------------------------------------------
-- Cache index filename: cache\cache.idx
------------------------------------------------------------------------------
--  http://my.yahoo.com/ <--THIS PAGE SOURCE IS CACHED AS-------
                                                                |
                                                                V
                                                      =====================
 Sun Jun 08 08:29:03 2003 | 40630 bytes | text/html | P:\CACHE\55075314.HTM

      DO A SEARCH IN CACHED PAGE SOURCE FOR-
                                            |
                                            V
 ==============================================
 http://us.i1.yimg.com/us.yimg.com/i/my/top7.gif

      AND REPLACE IT WITH-----------------------------------
                                                            |
                                                            V
                                                     ====================
 Sun Jun 08 08:29:07 2003 | 1965 bytes | image/gif | P:\CACHE\55075351.GIF

      DO ANOTHER SEARCH IN CACHED PAGE SOURCE FOR--
                                                   |
 http://us.i1.yimg.com/us.yimg.com/i/spacer.gif  <-

      AND REPLACE IT WITH----------------------------------
                                                           |
                                                           V
                                                   ====================
 Sun Jun 08 08:29:07 2003 | 43 bytes | image/gif | P:\CACHE\55075362.GIF

      ETC., ETC., ETC.

Hope this is much clearer.  It doesn't make any difference if the searched 
for item begins with "http://...."; or not.  This way the page can be read 
offline with the images shown.

It may even be prudent to replace the cached page source name (55075314.HTM) 
with an 8.3 filename so that it would be relevant for offline reading, e.g., 
for this example, replace cached page source name, 55075314.HTM, with 
myyahoo1.htm

Roger Turk
Tucson, Arizona

Reply via email to