please look at several dicussions before. Just a short wrap-up about this:

Problems are:
 - Cookies, which have to be managed
 - URL's (relative URLs to absolute ones, reroute URLs to the portal, so
that
   the result of an submit will be visualized in the context of the portal)
 - variables (have to be prefixed and handled to prohibite collisions)
 - Javascript functions (also collision problems)
 - some Javascript-events, which may result in collisions
 - CSS

This stuff was dicussed a few days ago.

We and David Sean Taylor will try to address some of this issues in the
next time. We are having solutions for some of this issues and want to bring
back some parts to open source.

The first approach would be a specialist "filter" portlet, which can address
some of this problems. Still I think that most of this areas *must* be part
of
the Jetspeed core engine. In some areas you need the knowledge about the
whole
page to proceed correctly, than it makes no sense to do it inside of a
portlet.

I think that issues are extremly important - otherwise it's not possible to
run "real" applications inside of a portlet and also a lot of other sources
in
the web will surely not directly work.

Marcus

> -----Original Message-----
> From: Diethelm Guallar, Gonzalo [mailto:[EMAIL PROTECTED]]
> Sent: Freitag, 10. November 2000 17:11
> To: 'JetSpeed'
> Subject: Web aggregation/scraping
> 
> 
> Hello,
> 
> I have been looking into ways of doing web page scraping.
> If there is partial or complete overlap with previous
> discussions, please excuse me, it is due to my poor and
> partial understanding of this subject.
> 
> Basically, page scraping means integrating information from
> different web pages into one page (sounds like Jetspeed?).
> The canonical example is, say you have several different
> web mail accounts (yahoo, hotmail, mail, etc.). Using
> web page scraping, you could create a single consolidated
> page that presents to you all the messages from all the
> mail accounts. This implies several things:
> 
> * You transparently log onto each mail service, with
>   a potentially different log on protocol.
> * You programmatically navigate to the page with the
>   mail messages for each service, process ("scrap")
>   that page looking for messages, and integrate them
>   into your consolidated page, eventually changing the
>   content, look and feel and general formatting of the
>   original page.
> * You translate any URLs or references on the fly, so that
>   the links from your consolidated page still work.
> * Eventually, you interact with your consolidated page
>   (say, you reply to a message) and that in turns triggers
>   a new programmatic interaction with the mail service
>   that achieves the intended purpose (i.e. it sends
>   the reply).
> 
> I have the impression that there is at least a level
> of overlap between these requirements and what Jetspeed
> provides (or will provide); is this correct? Is this
> one of the directions Jetspeed would (eventually) move?
> 
> I think there is one piece in the page scraping thing
> that is not present today in Jetspeed, which is the tools
> or model you would use to do the actual scraping: how
> do you specify things like:
> 
> * On a page with mail messages from yahoo, the From
>   line is contained on the second table in the page,
>   column 3.
> * Strip any content belonging to a form named "foo".
> * etc.
> 
> I'm not even sure about all the things you would want
> to do, but these certainly look like a possibility.
> This kind of functionality is provided today by services
> such as yodel-e, and I think it would make an interesting
> addition to Jetspeed. What would be a good model to
> achieve this?
> 
> I have been reading a paper about IBM WebEntree, a Java
> component that does this kind of thing. The paper is at
> 
>   http://www.research.ibm.com/journal/sj/374/zhao.html
> 
> and is dated 1998. Anybody knows anything more about this?
> I was unable to find any other references to it. Anybody
> knows of other (free, open source or commercial) tools
> to do this, especially Java-based?
> 
> Thanks for any input, comments and flames (which would
> prove, in the end, my lack of knowledge in the area).
> 
> 
> -- 
> Gonzalo A. Diethelm
> [EMAIL PROTECTED]
> 
> 
> --
> --------------------------------------------------------------
> Please read the FAQ! <http://java.apache.org/faq/>
> To subscribe:        [EMAIL PROTECTED]
> To unsubscribe:      [EMAIL PROTECTED]
> Archives and Other:  <http://marc.theaimsgroup.com/?l=jetspeed>
> Problems?:           [EMAIL PROTECTED]
> 


--
--------------------------------------------------------------
Please read the FAQ! <http://java.apache.org/faq/>
To subscribe:        [EMAIL PROTECTED]
To unsubscribe:      [EMAIL PROTECTED]
Archives and Other:  <http://marc.theaimsgroup.com/?l=jetspeed>
Problems?:           [EMAIL PROTECTED]

Reply via email to