Hello,
Automatically importing content in a CMS is never trivial.
First of all are we speaking about some limited structured content (e.g. some XML newsfeed) or some unstructured content which mixes content, layout, files, etc.. for hundreds of different pages (e.g. usually HTML web sites)?
As you may understand, the former is quite easy to import as you may easyily "map" it to the heavyily content structured relationships between objects inside a CMS. In Jahia you may do it for example by developing a small JSP which will create and populate automatically some containerlists.
The latter is far more complex as you will need to go from some unstructured content to some structured one.
Of course you may basically just import your raw HTML files inside some basic Jahia HTML template (e.g. you just extract or recreate the sitemap in Jahia, remove the navigation from your HTML files (with some kind of PERL scripts) and finally import this files as HTML files in Jahia and automatically map them to some kind of basic template with just one large unstructured HTML area in the central column. Then the editors may just call a WYSIWYG HTML editor in order to edit their text. Of course you will also have to clean up the HTML with some tools such as Tidy - http://tidy.sourceforge.net/ - in order to be sure that the fragment of imported HTML will not "break" the full page in which it will be included). I must say that I am not convinced at all by this solution as this just looks like a kind of FTP access to some HTML files. You loose all the advantages of moving from an unstructured to a structured content repository.
Then you also have more complex tools (ex: http://www.kapowtech.com/ ) which may help you extract structured content from unstructured web sites. But they are quite expensive and of course quite complex to use.
So finally you wil need to make a complete audit of the content you currently have, if it is useful to migrate it to the new system (most of the time 50% of the content could be put to the trashbin because it is out of date), how it is structured, how it is possible to extract it from the most cleanest and most structured manner and if it makes some sense to make some automated import migration script or finally if it not more easy and more affordable to just manually cut and past it with some internships into some new structured templates.
Please read the following articles which may help you get a better idea of the overall problem:
Migrating Legacy Content http://www.cmswatch.com/Features/TopicWatch/FeaturedTopic/?feature_id=88
Web Content Migration Project Design - Part 1 http://www.cmswatch.com/Features/TopicWatch/FeaturedTopic/?feature_id=103
Web Content Migration Project Design - Part 2 http://www.cmswatch.com/Features/TopicWatch/FeaturedTopic/?feature_id=105
When Word-XML Conversions Get Nasty http://www.cmswatch.com/Features/TopicWatch/FeaturedTopic/?feature_id=98
Regards St�phane Croisier
At 09:33 31/08/2004, you wrote:
Hi,
First of all, I'm new with Jahia.
We have a intranet site with 5, 000 pages and documents (word, pdf) and we would like to import this site into Jahia. I know that we can include HTML pages in Jahia directly, but we also need to let users to modify those pages, meaning the requirement of changing the links and adding the access permissions. Is there a way to import a whole site into Jahia ?
Thanks a lot, Olivier.
