On Fri, Nov 12, 2010 at 06:28:29PM -0500, Michael K. Johnson wrote:
> Once we're satisfied with the conversion, we can add code to load
> the contents into a new confluence instance as a "restoreBot" user
> to make it clear where the contents came from.  By keeping the HTML
> around as a backup, we can always figure out what was meant later
> when problems with the conversion are noticed and fix them up by
> hand.

I don't know how to handle the jira integration in the converter.
I see things like:
<div class='jiraissues_table' >
in the HTML that I have no idea how to convert into jira/confluence
syntax.  Can anyone lend me a hand there?  Otherwise I'll just let
someone fix those bits up after the fact.

Other than that, the converter seems to be producing something that
looks remarkably like jira-style formatting as I know it.  I'm sure
there are bugs left, but it's ready for someone else to look at.

Anyone who would like to look, feel free to check out
http://bitbucket.org/johnsonm/foresight-confluence-recovery
and run:
./chtml2jira scraped-html-content/*.html
It will write a ".jira" file next to each .html file that looks like
it has wiki content from it -- 367 out of the 446 files that John
resurrected from the Google caches.

Note that a lot of that is redundant and we'll need to clean up:
_display_community_Example%2BDeveloper%2BApplication.jira
_display_community_Example%2BMember%2BApplication%3FshowComments%3Dfalse.jira
_display_community_Example%2BMember%2BApplication%3FshowComments%3Dtrue%26showCommentArea%3Dtrue.jira

Clearly, those three lines are only one page, really.  Removing
all references to showComments gets us 289 pages.  Removing the
references to the viewrecentblogposts and viewpage.action pages (I
haven't looked to see what those are) gets us down to 241 pages.
Removing replyComments gets us down to 214 pages.  Removing
showChildren gets us down to 207 pages.  We'll clearly have to
clean out all the jsessionid's from the page names, too.

Here's my command that should show mostly pages that are worth
looking at:
ls *.jira | grep -Ev 
'(3F(showComments|replyToComment|showChildren))|(viewrecentblogposts|viewpage.action)'
_______________________________________________
Foresight-devel mailing list
Foresight-devel@lists.rpath.org
http://lists.rpath.org/mailman/listinfo/foresight-devel

Reply via email to