How to read a HTML file from the browser cache only

Marc Gueury Thu, 30 Dec 2004 06:11:33 -0800

Hello all,

I would like to write a Firefox extension that validates
the HTML of the browser in real time. The Html validation
code is already written and works fine.

I need to get the HTML text of a page when browsing.
I found all I needed except one thing. I have no idea how
to get the HTML text of the current page from the cache only.

I need opinion and feedback, since I am completely lost.
I tested this:

1.

    // - but get the content a second time !!!!
    const oldURL  = window.content.document.URL;
    const request = new XMLHttpRequest();

    // This must be done to make generated content render
    request.open("GET", oldURL, false);
    request.send("");

This work fine but the problem of XMLHttpRequest is that in
the internal code,there is a load flags with SKIP_CACHE. Then
the page is always fetched again from the webserver, what I
want to avoid assolutely.

I could rewritte the whole XMLHttpRequest. But
it is a really long code (about 2000 lines)

2. Then I did something like this:

try { const nsICacheService = Components.interfaces.nsICacheService; const cacheService = Components.classes["@mozilla.org/network/cache-service;1"].getService(nsICacheService); var httpCacheSession = cacheService.createSession("HTTP", 0, true); httpCacheSession.doomEntriesIfExpired = false; var cacheEntryDescriptor = httpCacheSession.openCacheEntry(url, Components.interfaces.nsICache.ACCESS_READ, false);

if (cacheEntryDescriptor) { this.lastFetched = cacheEntryDescriptor.lastFetched; alert( "device = " + cacheEntryDescriptor.deviceID + " / size = " + cacheEntryDescriptor.dataSize );

if( cacheEntryDescriptor.isStreamBased() ) { var inputStream = cacheEntryDescriptor.openInputStream(0); const scriptableStream = Components.classes["@mozilla.org/scriptableinputstream;1"].createInstance(Components.interfaces.nsIScriptableInputStream); scriptableStream.init( inputStream );

          var s = scriptableStream.read(scriptableStream.available());
          alert( s );

          scriptableStream.close();
          inputStream.close();
        }
      }
    }
    catch(ex)
    {
...

It works in some cases, when the page is ASCII and not compressed.
It does not work when:
- the HTTP header has: Content-Encoding: gzip
  -> I get the compressed file and not the HTML
- the protocol is "file://"
  -> it seems not to be cached
- for non ascii pages

In a way, it is a better than the 1rst test. Due that I am sure that the content is cached and that I never request the data again to the webserver. But I have the problems above.

Any idea or pointer on how to do in C or javascript ? There is maybe a simple way that I miss completely.

Thanks by advance,

Marc
_______________________________________________
Mozilla-netlib mailing list
Mozilla-netlib@mozilla.org
http://mail.mozilla.org/listinfo/mozilla-netlib

How to read a HTML file from the browser cache only

Reply via email to