Hello Olivier,

At 09:39 01/09/2004, you wrote:
> The latter is far more complex as you will need to go from some
> unstructured content to some structured one.
> Of course you may basically just import your raw HTML files
> inside some
> basic Jahia HTML template (e.g. you just extract or recreate
> the sitemap in
> Jahia, remove the navigation from your HTML files (with some
> kind of PERL
> scripts) and finally import this files as HTML files in Jahia and
> automatically map them to some kind of basic template with
> just one large
> unstructured HTML area in the central column. Then the
> editors may just
> call a WYSIWYG HTML editor in order to edit their text. Of
> course you will
> also have to clean up the HTML with some tools such as Tidy -
> http://tidy.sourceforge.net/ - in order to be sure that the
> fragment of
> imported HTML will not "break" the full page in which it will
> be included).
> I must say that I am not convinced at all by this solution as
> this just
> looks like a kind of FTP access to some HTML files. You loose all the
> advantages of moving from an unstructured to a structured
> content repository.

May you tell me more about this solution : I guess we will have to rename
the links, change the images, the layout, add the access rights. What else
and how ?
This solution sounds good to me because it's only for the startup at the
first time. Then, the content will evolve and those pages will become
outdated. But we need to let the users to modify them at the beginning.

First of all let's say that we never implemented such a solution on our side. This should just be a possible way of doing it. However, given the many criteria in order to apply it (cf. below), most of our customers finally preferred manually migrating their content (and by the way, taking the time to delete or clean outdated content, modify or restructe the navigation, enter new metadata or categories, etc...). So they kept their old HTML pages on a front-end Apache server while moving their migrated content section by section once converted to Jahia. Meanwhile they made some static cross-references to the ols static content within their new sites... This is certainly the most easy way to migrate lot of content.


Then if we come back to the suggested solution, you will have to make a script which:
1) crawl and capture your existing sitemap (which page is linked to which one)
2) remove the existing navigation, header, footer,... This may already be a tough step especially if you have not used generic templates to make your site... Then what is a menu, what is just a list of cross-links to other pages, how to automatically remove it, etc...
3) Parse and clean the result with some tools such as Tidy
4) Upload all the binary files and images on the Jahia webDAV server and rewrite all the url to point to the Jahia DAV server
5) Make a script in Jahia to a) create a new page according to the sitemap defined in 1 and b) import the cleaned HTML fragment (cf. 3) in a Jahia big text (e.g. the central column).


But all this process is only possible if all your HTML pages are quite generic enough in order to be able to remove all what needs to be removed and to easily find the existng navigation path within. If you have lots of different "templates" with links pointing to other pages a bit everywhere directly hardcoded in the text and so on... this will become quite a mess to automate it.

Finally this will never import 100% of your content. There will be a lot of exceptions to treat manually. So you will have to really evaluate the cost of developing and testing all this automated migration + reviewing the exceptions + then finally remigrating your content a bit later to some other more structured templates versus keeping it in HTML for a while and directly manually migrating section by section of your sites (or to outsource such a task in some more affordable off-shore countries) by beginning by the most urgent ones...

Good chance!
St�phane







Reply via email to