Re: [RT] Workflow and publishing using JCR

solprovider Wed, 07 Dec 2005 18:09:09 -0800

On 12/7/05, Felix Röthenbacher <[EMAIL PROTECTED]> wrote:
> Andreas Hartmann wrote:
> > I'm currently working on a draft of a "proper" JCR-based Lenya
> > repository API. When it came to the notorious "area" concept, the
> > following options appeared to be reasonable:
> > 1) Use different subtrees in the same workspace for different areas.
> > 2) Use different workspaces for different areas.
> > 3) Use version history labels.
> > ---
> > Option (1) has the disadvantage that corresponding nodes in different
> > areas have no implicit connection to each other. All areas have to be
> > maintained "manually", without support from the JCR API.
> I wonder if there is any good reason to maintain the correspondency
> between nodes in different areas by hand than rather use the mechanisms
> supported by JCR?
> > ---
> > Option (2) has the disadvantage that copying (cloning) nodes from one
> > workspace to the other can't be done inside a session. Nevertheless,
> > they can be bracketed in transactions. A major advantage of option (2)
> > over option (1) is that correpsonding nodes in different workspaces
> > share the same UUID and the same version history, which means there is
> > an implicit connection. A node can be updated (merged) from its
> > corresponding node in another workspace. You can think of workspaces
> > as SVN branches.
> As mentioned, a transaction can be used. One reason for updating
> to Jetty Plus and to include JOTM was exactly with this usecase
> in mind. One issue we should consider though is that transaction
> support wasn't fully implemented in Jackrabbit six months ago
> when I played around with it: the problem was that a transaction
> could be set to ready-to-commit but it was commited in this phase
> already, meaning a rollback was not possible anymore. This was
> the status-quo six months ago and I don't know how far transaction
> support goes these days in Jackrabbit.
> > ---
> > Option (3) is a totally different approach. Some introductionary points:
> > - each versionable node has a history (version graph)
> > - versions can have labels
> > - labels can be moved from one version to another
> > This means, publishing a node could mean to add the label "live" to
> > the last checked-in version (and remove it from a potential previous
> > version). Deactivating the node would mean to entirely remove the
> > label "live" from the version history.
> How is versioning of different areas handled? I think of a live
> area which I can easily revert if something bad happend during
> publishing. Same applies to the authoring area. I can't see
> how it's accomplished to get a snapshot of both
> live and authoring area with a certain time stamp.
> > For option (3), Session.save() has no effect as well, but it can be
> > executed in a transaction. This would allow to publish a large number
> > of nodes with the possibility of rolling back the whole operation.
> > The particularity of option (3) is that no copies are created, which
> > means it's rather a snapshot of the version history of the site.
> > You can think of it as an SVN tag. Live nodes can't be altered,
> > only replaced with new versions.
> > I'm not yet sure how the site structure would be handled in option (3).


I have always been critical of the "area" concept, and I dislike #1
and #2 for the reasons stated.  This sounds like the time to improve
the architecture by using something like #3.

In Lenya 1.2 terms, I would prefer:
content/docpath/doc-id/language/versions

There have been suggestions to move "language" even earlier in the
path.  As JCR, language could be either a node or a property.  I
cannot think of any reasons to have it as a node.

"versions" could be creation-time-based names for each revision
"200512071751.xml", plus "index.xml" or "live.xml" for the currently
published version.  "archive" could prefix an "a".  Trash could prefix
a "d" (for about to be "deleted").  Publishing is simply overwriting
"live.xml" with the desired revision.

The advantages are every revision of a document is in one place.  The
disadvantage was the live version would not be distinct in the file
system, which was useful.  That disadvantage does not apply when using
Jackrabbit because file operations are very difficult.

===
In JCR terms, use properties for the revision nodes:
- status = "archived", "deleted", "active", "draft" (useful so someone
else does not publish it before it is ready)
- created = {time}
- edited = {time of last edit}
- language (??)

There should also be a history (for each language) at the document
node for which revisions were published and when.  The document node
should also specify which revision is the live one (unless that node
is duplicated with a special identifier.)

Combining all this into the simplest structure (N=Node, P=Property,
-=1, +=many):
N:Content
+ N:DocumentID
- + N:SubdocumentID (same structure as DocumentID)
- - N:Document (extra node to ease distunguishing between the
Document's Nodes and Nodes for the subdocuments.)
- - - P:Type (DocumentType = XHTML...)
- - - P:LiveRevision (for each language)
- - - P:Status = Draft, Review, Published, Version (published but
another version is ready)
- - - P:Visibility
- - - P:Creator
- - - P:Languages Available
- - - P:{Other document properties}
- - - N:History of publishing for each language
- - + N:Revision
- - - - P:Status
- - - - P:Language
- - - - P:Editor
- - - - P:{Other revision properties}
- - - - N:ContentInformation
- - - - - N|P:Document-type-specific fields

Which properties apply to the document and which apply to the
revision?  Title, Navigation Title, Subject, and Description could be
either.  They could also be moved to Nodes with their own versioning,
but that may be overkill.  Does rollback to a revision also rollback
those fields?  Do we need a history of changes to those fields? 
Should changing the Title create a new revision?  Those answers will
determine the best structure.

Lenya 1.2's various sitemap.xml files should be built dynamically from
the content.  That may require caching for good performance.

===
Another decision is whether subdocuments are Nodes of the document. 
There has been much discussion about using a unique identifier for
every document.  This can be implemented as a minor change to the
above.  First, discard the "N:Document" node because it would not be
necessary, because there would be no N:SubdocumentID nodes.  Second,
add a P:ParentDocumentID.

Now use a flat structure:
N:Content
- N:Hierarchy (created from the Document Nodes, could include
information such as Visibility, substitute for sitetree.xml)
+ N:DocumentUniqueID
- - P:ParentDocumentID - only if a subdocument
- - P:Path
- - Everything under N:Document above

This is really good for many reasons:
1. No need to distinguish between content Node and subdocument Nodes.
2. Easy operation on multiple or all documents, such as building the
Search index.
3. Easy moving of subdocuments. Just change the ParentID, rather than
moving Nodes.
4. Easy creation of flat views.  See all documents by Title, Author,
Status, Type.  The Status view will show all documents having versions
waiting to be published.  The Type view can show all Employees, even
if they are created as subdocuments of Departments.
5. All information is stored in the Document Nodes, so "Visibility"
and "Navigation Title" are  available when working with the document.

It also imposes additional concerns:
1. The tree of documents (N:Hierarchy) must be maintained for easy
transversal.  It should only contain the DocumentUniqueIDs.  Creating
sitetree.xml should access the document Nodes for the Visibility
property.  This allows creating menus based on other properties. 
Eventually someone will add individual document security, and menus
will be built to show only the documents allowed to each person (based
on their name and/or Groups).

2. What happens to orphans?  Subdocuments will not automatically be
deleted when the Document Node is deleted.  It would be easy to create
a view of documents that have ParentDocumentIDs that no longer exist. 
Or the Delete process could also delete all subdocuments recursively.

solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [RT] Workflow and publishing using JCR

Reply via email to