[RT] Implementing Cocoon Blocks

Stefano Mazzocchi Tue, 19 Aug 2003 19:18:57 -0700

This is a collection of (more or less) random thoughts about the implementation of Cocoon Blocks that I collected while talking with Ricardo and Sylvain IRL.

Please note that anything proposed here, while organic and workable, is not to be considered carved in stone, but rather a suggestion on how to move forward.

- o -

Design Constraints
------------------

1) impact on back compatibility should be minimal, optimally none. that is: everything that worked before the introduction of blocks should continue to work with no required changes [this will reduce migration issues]

2) the implementation should be incremental and evolutionary. no radical changes in the cocoon architecture should be created [this will reduce the amount of code to write and also provide better regression]

3) the CVS tree should be buildable at all times [this will be enforced by an evolutionary approach to the implementation]

4) security of the architecture for block managing and deploying is a *TOP* priority and should be introduced up front.

5) deployment should be system administrator-friendly. that is, should *NOT* require GUIs or webapps (even if it should allow them to be possible)

- o -

The overall architecture
------------------------

Let's start with the first requirement: security.

Blocks are functional components at the webapp level. If a user is able to change the block wiring, the user is, potentially, able to execute his/her own code with the same security level of the entire cocoon application.

For this reason, the block wiring information should be located in a configuration file that is "read-only" by cocoon and "read/write" by the block deployer.

  +--------+                          +----------------+
  | cocoon | <--- [File System] <---> | block deployer |
  +--------+                          +----------------+

Note that the block deployer *could* be anything (a CLI, a webapp, an eclipse plugin). The above meets our second requirement: user friendlyness for all types of users.

Also note that it meets, potentially, the ability for the system administrators to perform actions such as 'staging' and 'cluster replication' by simply performing a file copy. Cocoon should be able to reload the block wiring information if this is changed.

In order to improve security and avoid DoS, there is *no way* for the block deployer to signal directly information to the cocoon instance (and no way for the cocoon instance to modify the wiring information or to communicate directly with the block deployer). Everything is performed thru the use of the file system.

The block deployer
------------------

The block deployer architecture is the following

          +--------------------+     +------------------+
          | +-----------+      |     |                  |
 [FS] <---->| FS Driver |      | <-> |  User Interface  |
          | +-----------+      |     |                  |
          |                    |     +------------------+
          |      block         |
          |    services        |
          |                    |
          |  +---------+       |
          | +---------+|       |
          | | Locator |+       |
          | +---------+        |
          +------^-------------+
                 |
                 V
            block library

which is composed by four main parts:

1) the file system driver: the part responsible for reading/writing the block wiring information and block configurations, to extract the files from the blocks distrubution archives and physically deploy the extracted files on the file system. There is no need for polymorphism for this part since there needs to be a solid file system contract between this driver and the cocoon block manager (included inside cocoon) which will need to read the block wiring info and locate the files on the file system.

2) the block locator: the part responsible for locating the metadata associated with a given block identifier and thus, provide enough data for the block services and the user interface to drive the installation process. This part needs polymorphism. Potential implemenation of this locator are:

a) "file system"-based locator: the block metadata and location information is stored in a file on disk.

b) "network service"-based locator: the block metadata is provided by a network service (for example, a web service).

The block deployer can use multiple locators at the same time, in a cascading way: it should be possible to configure the block deployer with the kind of location services and provide a priority for which one to use. This allows, for example, to provide an architecture for block discovery that could work like this:

block deployer ---> company block library -> cocoon official library

[a collection of blocks is called a "block library". the application that, given a block identifier, looks up its metadata is called "block librarian".]

3) the block services: the part that is shared by all potential block deployers (no matter how the user inteface is implemented).

4) the user interface: the part that is driving the block services but it's dependent on the user interface.

- o -

The Block Manager
-----------------

The block manager is the part that is responsible for handling the block wiring information. This is included inside cocoon and it can read and interpret the block wiring information written by the block deployer.

The block manager is the only part of cocoon that knows how block are wired together and where their actual location on disk is.

The block manager will be queried by all the cocoon internal services that need to locate block-dependent stuff, that is:

1) the sitemap interpreter: to find out where the blocks sitemaps are mounted in the main sitemap URL space 2) the block: protocol: to locate the services provided by the blocks 3) the component manager: to locate components provided by the blocks (either avalon components, sitemap components and virtual components)

- o -

File System Layout and wiring data
----------------------------------

Let us suppose we have the following blocks that are deployed in our system

  cob:mycompany.com/webmail/1.3.43
   has a sitemap located on -> /webmail.xmap
   depends on -> cob:mycompany.com/skin
     names this dependency -> external-skin
   depends on -> cob:mycompany.com/skin/2.0
     names this dependency -> internal-skin
   depends on -> cob:anothercompany.com/MailRepository/2.0
     names this dependency -> repository
     uses component -> "com.anothercompany.repository.Repository"
       names this component with role -> repository
   requires the configurations:
     "user" of type string with no default
     "password" of type string with no default

  cob:yetanothercompany.com/skins/fancy/1.2.2
    implements -> cob:mycompany.com/skin/1.2

  cob:mycompany.com/skins/corporate/34.3.345
    implements -> cob:mycompany.com/skin/2.3
    extends -> cob:yetanothercompany.com/skins/fancy/1.2.2

  cob:mycompany.com/repositories/email/exchange/3.2.1
    implements -> cob:anothercompany.com/MailRepository/2.0
    exposes component -> "com.anothercompany.repository.Repository"
    requires the configurations:
     "host" of type string, with default "127.0.0.1"

the above information is extracted from the block metadata included inside the blocks themselves and is deployment independent (also, the deployment process cannot modify these properties)

The deployment process added the mounting, wiring and configuration information

 cob:mycompany.com/webmail/1.3.43
  located at -> WEB-INF/blocks/384938958499
  mounted on -> /mail/
  "external-skin" -> cob:yetanothercompany.com/skins/fancy/1.2.2
  "internal-skin" -> cob:mycompany.com/skins/corporate/34.3.345
  "repository" -> cob:mycompany.com/repositories/email/exchange/3.2.1
  configured as:
   user -> "guest"
   password -> "sj3u493"

 cob:mycompany.com/repositories/email/exchange/3.2.1
  located at -> WEB-INF/blocks/394781274834
  configured as:
    host -> "mail.blah.org"

 cob:yetanothercompany.com/skins/fancy/1.2.2
  located at -> WEB-INF/blocks/947384127832

 cob:mycompany.com/skins/corporate/34.3.345
  located at -> WEB-INF/blocks/746394782637

the file system layout (relative to the cocoon webapp context) is

[-] WEB-INF L___ [-] blocks L___ wiring.xml L___ [-] 384938958499 | L___ [-] BLOCK-INF | | L___ block.xml | L_ (the contents of cob:mycompany.com/webmail/1.3.43) L___ [-] 947384127832 | L___ [-] BLOCK-INF | | L___ block.xml | L_ (the contents of cob:yetanothercompany.com/skins/fancy/1.2.2) L___ [-] 746394782637 | L___ [-] BLOCK-INF | | L___ block.xml | L_ (the contents of cob:mycompany.com/skins/corporate/34.3.345) L___ [-] 394781274834 L___ [-] BLOCK-INF | L___ block.xml L_ (the contents of cob:mycompany.com/repositories/email/exchange/3.2.1

where

wiring.xml contains the block IDs (which also identifies their location on disk) wiring, mounting and configurations.

block.xml contains the block metadata (which belong to the block and cannot be changed at deployment time).

NOTE: if the location path of the block is relative, it is searched by starting from the cocoon war context. The block content is *always* extracted from the archives and saves "as is" inside the folder.

NOTE (development time): in order to simplify block creation and development, it will be possible to explicity indicate the location of an already existing and extracted block implementation on disk. The block manager should also have autoreloading features (configurable, of course) that should reload the configurations, the wiring and the exposed components when they are changed.

- o -


Issues that were still unsolved
-------------------------------

1) block identification

All blocks (behaviors and implementations) are identified by a URI. the format of the URI is as follows:

cob:organization/name/x.y(.z)

where

cob: is a virtual protocol that is used instead of http:// to avoid the problem of mistaking the URI for a URL

"organization" is the unique identifier for the organization that is responsible for the maintenance of that identifier. the ICANN domain name should be used [for example, apache.org for the ASF and so on]

"name" is the unique name of the identifier. it is suggested that a path delimiter is used to further specialize the name (see belows for examples)

x.y.z is the version identifier

   x -> major (>= 1)
   y -> minor (>= 0)
   z -> bugfix (>= 0) (only for implementations)

NOTE: identifiers are case insensitive.

Example of good identifiers are

  cob:apache.org/cocoon/PDF/2.6
  cob:apache.org/cocoon/Fop/3.4.34
  cob:apache.org/cocoon/iText/1.0.43
  cob:mycompany.com/mydepartment/myself/myblock/3.2.23

example of bad identifiers

cob:cocoon.apache.org/whatever/2.3.434

the use of the virtual host instead of the domain name should be avoided because it mixes location and identification concerns.

cob:apache.org/cocoon/block/whatever/2.3.4

the inclusion of the "block" name should be avoided because redundant (the cob virtual protocol was introduced exactly to specify block specificity and avoid location and identification semantic collisions)

cob:apache.org/cocoon/PDF/Fop/2.3.43

information of what behavior is implemented by a given block implementation should not be included in the identifier.

2) dependency ranges

When a block implementation depends on another block (either implementation or behavior), it should be able to have an 'elastic' dependency which doesn't connect it to the versioned identifier, but to a range of those versions.

Instead of explicitly indicate the range description language, it is suggested to implicity describe range rules. These implicit range rules are:

a) if the dependency doesn't include the version, all versions are matched

ex: both "cob:apache.org/blah/1.0" and "cob:apache.org/blah/3.43.342" are matched by "cob:apache.org/blah"

b) if the dependency includes a version, versions are matched with the following rules

i) if major is equal ii) if minor is greater or equal iii) in case of implementations and if minor is equal, if bugfix is greater or equal

ex: depending on "cob:apache.org/blah/2.0.34" will match

        - cob:apache.org/blah/2.0.345
        - cob:apache.org/blah/2.3.23

but not

        - cob:apache.org/blah/1.0.0
        - cob:apache.org/blah/34.323.324534

3) persistent service behavior with hot deployment

One of the big issues with hot deployment is the potentially inconsistent state of the persistent services contained by one block and used by another when the providing block is redeployed.

The issue is easily solvable for block services provided via sitemap by imposing them as stateless services (or REST-like, by passing all the required information every time).

The problem appears evident for component instances.

It is suggested that blocks don't allow direct classloading between blocks, but that only components exposed in the block deployment descriptor will be made available to other blocks. This way, all the dependencies are known because all the component loading happens thru the Block Manager and the block manager is able disposte and reinstantiate all the blocks that contain instances of components that are in an inconsistent state.

While it is possible to write a classloader which is smart enough to do the above even for transparent classloading (say, loading via "new Blah()" instead of via cocoon.getComponent("Blah")), it is suggested to disallow direct classloading to avoid creating hidden contracts between blocks.

4) block mounting

Some blocks are meant to be publicly accessible and, for this reason, they can be "mountable" onto a particular location of the URL space handled by Cocoon.

Such mounting will be "implicit", meaning that the main cocoon sitemap will not be modified by the block deployer.

This means that, in order to achieve, back compatibility, when a block is deployed on cocoon, the sitemap interpreter asks the block manager whether or not there is some mounted block that matches the incoming request, if so, that block is invoqued, otherwise, it falls back on the main sitemap.

This implies that it's entirely possible that a block "obscures" pipelines located in the mail cocoon sitemap (or subsitemaps mounted the direct way in there), but it is suggested that the sitemap interpreter doesn't fallback to the main sitemap if the block sitemap is invoqued, but no matching pipeline is located. This is to avoid potentially dangerous (security-wise) holes in the block URL-space covering that could lead to hard to forecast issues.

This means that the sitemap interpreter should:

check with the block manager if a block matches the request if so, pass the request to the block that is mounted in that location if not pipeline matches the request in that block, trigger a 404 if no block is mounted on that location, invoque the cocoon main sitemap

5) block configuration at deployment time

blocks will contain configurations that is written at block-release time but there are information that are deployment dependent. The block deployment descriptor contains a list of those configurations that are required to be entered at deployment time.

Since these configurations will rather be context-dependent tokens, these can be considered more as properties. An example of a descriptor could be:

 <properties>
  <property name="username">
   <default>guest</default>
   <description>The name of the user</description>
  </property>
  ...
 </properties>

then, these values will be accessible in the usual block.xconf using {name} style. For example

...
<datasources>
 <datasource name="rbdms">
  <username>{username}</username>
  ...
 </datasource>
</datasources>
...

- o -

Implementation Phases
---------------------

Phase 1: definition of the contract between the block manager inside cocoon and the standalone block deployer. These contracts include:

 1) description of the file system layout (see above for a suggestion)
 2) description of the wiring document schema
 3) description of the block metadata schema

Phase 2: definition and implementation of the block data model, with reading/writing capabilities

 1) implementation of the block wiring data model
 2) implementation of the xml -> data model parser
 3) implementation of the data model -> xml serializer

NOTE: since the xml formats are *not* meant to be human editable, roundtripping of comments or formatting included in those xml files should not be a priority.

At this point, implementation can work parallel:

Phase 3 - cocoon side: implementation of block support.

This phase includes:

 3a) implementation of the BlockManager
 3b) implementation of the block: protocol handler
 3c) implementation of the link transformer
 3d) implementation of the reload watchdog

[note: the link transformer has to be "block" aware in order to identify where other blocks are mounted]

NOTE: during this phase, development can happen with a handwritten and extracted block wiring info and block descriptors.

Phase 3 - deployer side: definition of the interfaces between the components:

  3a) the Locator interface
  3b) the Block services interfaces

Phase 4 - deployer side: implementation of a basic block deployer

  4a) implementation of the block services
  4b) implementation of a "file system"-based locator
  4c) implementation of a command-line user interface

Phase 5 - deployer side: implementation of a webservice block librarian

5a) implementation of a REST-style web service locator 5b) implementation of a cocoon block that implements block librarian capabilities

- o -

Awaiting for your comments.

--
Stefano.

[RT] Implementing Cocoon Blocks

Reply via email to