On Fri, Nov 12, 2010 at 06:28:29PM -0500, Michael K. Johnson wrote: > Once we're satisfied with the conversion, we can add code to load > the contents into a new confluence instance as a "restoreBot" user > to make it clear where the contents came from. By keeping the HTML > around as a backup, we can always figure out what was meant later > when problems with the conversion are noticed and fix them up by > hand.
I don't know how to handle the jira integration in the converter. I see things like: <div class='jiraissues_table' > in the HTML that I have no idea how to convert into jira/confluence syntax. Can anyone lend me a hand there? Otherwise I'll just let someone fix those bits up after the fact. Other than that, the converter seems to be producing something that looks remarkably like jira-style formatting as I know it. I'm sure there are bugs left, but it's ready for someone else to look at. Anyone who would like to look, feel free to check out http://bitbucket.org/johnsonm/foresight-confluence-recovery and run: ./chtml2jira scraped-html-content/*.html It will write a ".jira" file next to each .html file that looks like it has wiki content from it -- 367 out of the 446 files that John resurrected from the Google caches. Note that a lot of that is redundant and we'll need to clean up: _display_community_Example%2BDeveloper%2BApplication.jira _display_community_Example%2BMember%2BApplication%3FshowComments%3Dfalse.jira _display_community_Example%2BMember%2BApplication%3FshowComments%3Dtrue%26showCommentArea%3Dtrue.jira Clearly, those three lines are only one page, really. Removing all references to showComments gets us 289 pages. Removing the references to the viewrecentblogposts and viewpage.action pages (I haven't looked to see what those are) gets us down to 241 pages. Removing replyComments gets us down to 214 pages. Removing showChildren gets us down to 207 pages. We'll clearly have to clean out all the jsessionid's from the page names, too. Here's my command that should show mostly pages that are worth looking at: ls *.jira | grep -Ev '(3F(showComments|replyToComment|showChildren))|(viewrecentblogposts|viewpage.action)' _______________________________________________ Foresight-devel mailing list Foresight-devel@lists.rpath.org http://lists.rpath.org/mailman/listinfo/foresight-devel
