Re: Scalability problem in PortletSetFactory ?

Santiago Gala Thu, 16 Nov 2000 04:32:00 -0800


[EMAIL PROTECTED] wrote:

> Raphael Luta wrote:
>
> > > [EMAIL PROTECTED] wrote:
> > > Whether it is better to store the PSML reference in a session or in
> some
> > > other data structure depends on the portal usage pattern. We envision a
> > > usage
> > > pattern where a large number of users accesses the portal mostly
> between 0
> > > and 5 times a day; each user can have a personalized page or gets the
> > > default
> > > page. We expect a large percentage of users to just use the default
> page.
> > >
> > > Storing the reference to the PSML datastructure in the session is the
> most
> > > adequate solution for this scenario, as the PSML file is only parsed
> once
> > > per session and can be garbage collected immediately after the session
> > > expires. As each user has a personal page, the PSML cannot be shared
> > > between users in our scenario.
> > >
> >
> > If I understand your scenario correctly:
> > - you'll never really use the cache for the personalized pages because
> >   your users will probably expire their sessions between requests (as
> >   there are few request/day expected)
>
> Not quite. Users may e.g. access the portal once per day, view their
> personal
> portal page, click on a link, go back to their page, click on another link,
> etc.
> In this case the data in the session or in the cache would be used.
>
> > - you'll heavily benefit from having the default PSML always in cache.
> >
> > In what respect a MRU cache would *not* fit your needs ?
>
> Would you use an MRU cache implementation similar to that described at
>
> http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/perf4.html
>  ?
>
> I'm not sure how a MRU cache would perform under high load, with
> some ten-thousands of concurrent users on an app server with a big
> thread pool running on a multi-processor machine - there might be
> synchronization issues resulting in temporarily blocked threads.
>
> The MRU cache probably needs synchronization of the methods to
> get/remove elements to/from the cache and needs to find/remove
> an element of the list and add it at the head of the list for
> each cache hit ... You may need to synchronize on the cache while
> you are reading and parsing the PSML to get the object structure to
> put in the cache. All threads handling concurrent requests that
> want to get something from the cache during that time would be
> blocked until the cache has been updated. As it takes some time to
> read a PSML file from disk and create the object structure that
> represents it, it may happen that the threads that cause cache
> misses block the threads that would cause cache hits for a
> significant amount of time.
>

I solved a similar problem for the DiskCache (synchronising while a potentially long 
HTTP
request was performed) by maintaining one list of "in process" urls, which would have a
vector of Threads requesting the url. All the Threads would wait on the interned url
string (unique). When the first THread (the one actually loading the URL) finishes, it
calls notify() on the sync interned String.

This will limit contention a lot. For the disk cache, it works quite well. (The
suggestion of using the url.intern() as a "dock" for the waiting threads came from
Ricardo Rocha, and it is bright).

For the PortletSetRegistry, it would work as follows:

Once a PSML is requested, the system checks if it is being loaded/parsed. If it is, the
thread adds to a list and waits. This is fast, and synchronized on the "active urls"
list.

If it is not, the Thread cheacks (synced on the cache) if the PSML is there. If it is
there, it returns the object.

If it is not, it adds itself (stil sinced on the cache, and also on the loading list) 
to
the "active" list, gives up the monitor on the cache and the list, and proceeds with 
the
loading/parsing of the resource.

Finally (literally "finally {}" on a try/catch) it (synced again) adds the object to 
the
cache, removes itself from the list, notifies any waiting Threads, and proceeds using 
the
PSML object or propagates any Throwable received.

Any Thread waiting is notified, and repeats the whole process.

There should be little contention in any case, because only one user session is active
for any user login (typically) and we would only have to serialize different frames or
windows from the same session until the user PSML is parsed and stored in the cache.

>
> If we store the object structure once per user session,
> all threads can still run in parallel without synchronization.
> Application servers can manage sessions, use of self-programmed
> caches takes away control from them.
>

We still would need to synchronize session access to the object or else
retrieval/parsing. If we don't do it, several threads can have a race condition to
parse/store the user PSML in the session (I imagine a setup with frames, where several
requests are nearly concurrent on the same session for each hit).

>
> My feeling is that if you run a portal in an environment where
> performance really counts, lets say on a machine with 8+ processors
> and some GB of RAM) we may be better off by just accepting a certain
> memory footprint per session, calculate the amount of RAM needed an
> plug it into the machine. I don't know how big the object tree
> representing the PSML can get, but if we assume 40 KB and 10000
> concurrent sessions, this would mean a memory usage of only 400 MB.
>

I would say that currently Jetspeed cannot handle more than a few requests per second.
This means that a farm of servers should be used in high load, and it will divide the
number of concurrent sessions per server to a more realistic number.


>
> The other question I have is how we would handle changes of the
> user's PSML ? Assume we have a page customizer and a user changes
> his page layout. Would the page customizer remove or update the
> PSML in the cache ? Or would the cache check whether the file has
> changed each time it is accessed ?
>
> > The only disadvantage I see to the MRU is that it will use a
> > little more memory under small load (because the pages will
> > be persisted in the cache and not released).
> > Once the MRU is full or nearly full, the behavior and cost
> > associated to the MRU should be about the same that the cost
> > associated to the session cache. Am I missing something here ?
>
> Behavior under small load does not worry me.
>
> > > I understand that there are other cases, where PSML files can
> > > be shared between users. Is the time consumed for parsing the
> > > PSML and generating the object tree that represents it or
> > > memory usage per session the problem that you see when using
> > > the session approach ?
> > >
> >
> > My main concern about using a session cache is that we're making
> > a usage pattern asssumption which may not be true in some
> > installations. I'd like the portal engine to be agnostic to usage
> > pattern.
>
> The MRU cache relies on the assumption that it is advantageous
> to cache the n last recently used elements. If for example you
> have 10000 users with custom pages, who request their home page
> every ten minutes (home page, read article, back to home page,
> read article, back to home page, read article, ...) you get a
> mean of 1000 requests for the home page per minute. If you have a
> MRU cache with a size of 11000 you're fine, if the MRU cache can
> store 9000 elements, it may happen that you only get cache misses
> all the time, because a user's PSML is always discarded just
> before the user would have accessed his home page again.
>

We can have the best of both worlds if we add implements HttpSessionBindingListener to
the PortletSetFactory object, store it both in the session and as a Singleton, and have
configurable behaviour to destroy it when the session expires and the number of stored
entries in the SingletonHolder is above a certain value (The servlet engine should 
unbind
any objects in the Session when the session is invalidated).

It also looks simple to implement in our current source base.

The idea just came to me. What do you think?

>
> > Optimization for a given pattern should be handled at a
> > pluggable component level.
>
> I surely agree with that. We might have an MRUCache and a
> "SessionCache". We could let them implement the same interface
> to make them exchangeable. I guess we'd have to pass RunData
> to allow a cache implementation to put data in the session
> if required. A property may be used to determine which strategy
> to use.
>
> It would be ok to start with a MRU cache if it is possible to
> replace it with a "cache" that uses the session to store data
> when required.
>
> > The Profiler component is currently responible for implementing
> > the usage pattern, maybe we can add methods to the Profiler API
> > to allow a profiler to provide caching hints for a cache system ?
>
> A hint might be "per-user", "per-group" or "global", perhaps. For
> per-user data, the "SessionCache" might be used, for per-group or
> global data the MRU cache. That would add some additional complexity,
> though.
>
> It seems the UserProfiler also does some caching. Would this mean
> we cache the PSML file in the DiskCache and the object tree that
> is generated from it in the MRU cache ?
>

I don't think there is caching there. Only retrieval of the user PSML, passing through
the (missnamed) JetspeedDiskCache. It should be called JetspeedResourceManagerService 
or
something similar. The classes are rather small but I can have read them wrong. The
caching is done by the fact the the PortletSetFactory for each PSML is a singleton,
stored under the PSML url key.

When the code is refactored, I will send a proposal to have the JetspeedDiskCache as a
pluggable ResourceManager service, with API to retrieve, expire, get a Reader or a 
Writer
on a resource, etc. handling parallel requests of the same resource.

>
> Best regards,
>
> Thomas
>
> Thomas Schaeck
> IBM Pervasive Computing Division
> Phone: +49-(0)7031-16-3479  e-mail: [EMAIL PROTECTED]
> Address: IBM Deutschland Entwicklung GmbH,
> Schoenaicher Str. 220, 71032 Boeblingen, Germany
>
> --
> --------------------------------------------------------------
> Please read the FAQ! <http://java.apache.org/faq/>
> To subscribe:        [EMAIL PROTECTED]
> To unsubscribe:      [EMAIL PROTECTED]
> Archives and Other:  <http://marc.theaimsgroup.com/?l=jetspeed>
> Problems?:           [EMAIL PROTECTED]



--
--------------------------------------------------------------
Please read the FAQ! <http://java.apache.org/faq/>
To subscribe:        [EMAIL PROTECTED]
To unsubscribe:      [EMAIL PROTECTED]
Archives and Other:  <http://marc.theaimsgroup.com/?l=jetspeed>
Problems?:           [EMAIL PROTECTED]
Re: Scalability problem in PortletSetFactory ?

Reply via email to