Re: Caching

Santiago Gala Thu, 16 Nov 2000 15:11:36 -0800


[EMAIL PROTECTED] wrote:

> > Raphael Luta wrote:
> >
> > That is a memory cache. The problem is that the PSML should be
> > fetched/parsed atomically, and then stored in a memory cache such
> > as the Turbine one, to avoid race conditions if we use frames. This
> > is true of most external resources, that should be fetched atomically
> > and then cached until they expire.
> >
> > With regards to the retrieval of external resources (basically URLs)
> > Jetspeed needs an API that respect several requirements:
> >
> > - No resource will be fetched in parallel by more than one thread.
> > One thread will fetch it and the others will wait until completion.
> > - No resource will be fetched again until it expires.
> > - A writable resource will expire as soon as it is written.
> > - Calls to check availability, get a Reader and get a Writer on the
> > resource.
> > GetObject for non character stream objects.
> > - I notice now that we need a hook for Fetching, depending on the type
> > of object (e.g. parsing/unmarshalling XML is included in the process)
>
> This reminds me of the service concept we discussed some time ago.
> Similarly to other services, we could define a CacheService interface
> for which different implementations may exist.
>
> For obtaining content from remote URLs my preferred implementation
> would be a ProxyCacheService that just requests the URL from a proxy
> that does the actual caching for a cluster of portal servers.
>
> For obtaining the user PSML I'd prefer a SessionCacheService that holds
> the  PSML tree in the session.
>
> Others might have other preferences and make different choices.
>
> >
> > This is very important for external URL (I remember when I fixed the
> > fact that Netscape DTD was being fetched in parallel from the web a
> > few hundreds of times on Jetspeed Initialization), but having a uniform
> > Resource Access API in alsO important to be able to abstract use of
> > different mechanisms/setups in Jetspeed.
> >
> > For example, a File based implementation, a HTTP based implementation
> > (with or withour webDAV)
>
> I found this in the web at http://www.webdav.org/: "WebDAV stands for
> 'Web-based Distributed Authoring and Versioning'. It is a set of
> extensions to the HTTP protocol which allows users to collaboratively
> edit and manage files on remote web servers." Do you mean this one ?
>

Yes. It extends HTTP with MOVE, COPY, and locking methods, also with the
possibility to tag URLs with metainformation (handy for cache expiration or
other info). Also, it is supported by Apache through the mod_dav module, as I
learned in the ApacheCON.

>
> > and a DB implementation are three common setups that we need to
> > handle. In the case of generic Objects, other marshalling/
> > unmarshalling implementations can be devised (for example the
> > unmarshalling of user PSML).
> >
> > We are currently considering webDAV for our access to the Resource
> > server in our first trial implementation, having one machine that
> > will get the resources and handle caching, expiration and locking of
> > writes, while a farm of Jetspeed servers will have configured webDAV
> > access to these machine and maybe local caching of some external
> > resources. Still, it will use the (missnamed)JetspeedDiskCacheEntry
> > to coordinate access to the resources.
>
> Sounds very reasonable. Basically, this means you have a proxy that does
> caching for all your portal servers, right ? What do you mean by
> coordinating access to resources ? If you have a proxy that does caching,
> wouldn't it coordinate access to resources on the web ?
>

There are two different issues:
- Caching proxy, or similar, for remote URL.
- Locking/Transactional integrity for writable objects, such as user PSML or
user content (we plan to allow users to build, author and share their own
channels).

We thought about using an out-of-the-box proxy for storing the remote URLs,
but there is a caveat. Most of the channels do not implement correctly the
Expires: HTTP header, so the proxy will fetch them every time. With
extensions to the feeder and cache services we plan to use OCS feeds to get
the update information. I personally think that having right the HTTP
Expires: header is THE WAY, and that RDF is just a hint on typical update
times, but most servers don't, and they always give Expires: 0 for dynamic
content.

We have already experienced IP blocks from network54 due to the excessive
stress we put on them when we fetch their channels listed in xmltree.com, so
I think it is critical to have the semantics of channel loading right.

So, we are planning to build on the current JetspeedCacheService (changing
names and making it more modular and fully writable service), to use it as
our "specialized" proxy. If we have time, we will check using the http
protocol handler coming from Jigsaw (www.w3.org/jigsaw) instead on Sun's.
This handler allows to limit simultaneous connections to the same host, as a
means to avoid unwanted DOS attacks :-)

>
> Best regards,
>
> Thomas
>
> Thomas Schaeck
> IBM Pervasive Computing Division
> Phone: +49-(0)7031-16-3479  e-mail: [EMAIL PROTECTED]
> Address: IBM Deutschland Entwicklung GmbH,
> Schoenaicher Str. 220, 71032 Boeblingen, Germany
>
> --
> --------------------------------------------------------------
> Please read the FAQ! <http://java.apache.org/faq/>
> To subscribe:        [EMAIL PROTECTED]
> To unsubscribe:      [EMAIL PROTECTED]
> Archives and Other:  <http://marc.theaimsgroup.com/?l=jetspeed>
> Problems?:           [EMAIL PROTECTED]



--
--------------------------------------------------------------
Please read the FAQ! <http://java.apache.org/faq/>
To subscribe:        [EMAIL PROTECTED]
To unsubscribe:      [EMAIL PROTECTED]
Archives and Other:  <http://marc.theaimsgroup.com/?l=jetspeed>
Problems?:           [EMAIL PROTECTED]
Re: Caching

Reply via email to