I've rebased this and pushed it to branch `db/6534`.  I resolved several 
conflicts, so please start further work off of that branch.  All specific 
examples below are from https://github.com/mxcl/homebrew/wiki/

* When a wiki is deleted, the title is changed.  This should be refactored in 
to a `delete()` method on the model.  Even though it's just one line, it 
shouldn't be repeated in multiple places.
* In convert_markup()
    * `name_and_ext = filename.split('.', 1)` should be changed to use 
`os.path.splitext` (or at least `rsplit`).  It causes some pages not to be 
handled right: "Not a wiki page Homebrew-0.9.3.md. Skipping"
    * we need to handle markdown specially: don't do any conversion.  We lose 
some formatting when it goes through render_any_markup and then back through 
html2text (e.g. External-Commands.md loses its table structure).  We'll also 
need to keep the original markdown anyway, for [#6622].
* alignment of "Import wiki history" checkbox on the individual import form is 
weird
* it looks like gollum is case-insensitive, e.g. [Tips n' Tricks] on 
Home.textile  Can we cleanly support that too?
* Acceptable-Formulae.textile
    * extra newlines are inserted (iirc, fix with: `html2text.BODY_WIDTH = 0`)
    * "&" in gollum tag doesn't work
* textile specific issues (mostly from Acceptable-Formulae.textile)  These can 
be a separate ticket that we merge later.  I want to merge this main wiki 
branch soon :)
    * after "There are good reasons for this:" should be a numbered list
    * table structure is lost
    * `Niche Stuff <a name="Niche_Stuff"></a>`
    * `*[[this checklist|Troubleshooting]]*` doesn't convert right 
(Home.textile)


---

** [tickets:#6534] Wiki importer for github**

**Status:** in-progress
**Labels:** import github 42cc 
**Created:** Wed Aug 07, 2013 09:54 PM UTC by Dave Brondsema
**Last Updated:** Thu Sep 26, 2013 03:27 PM UTC
**Owner:** nobody

Wikis are git repositories and can be accessed like `git clone 
https://github.com/OpenRefine/OpenRefine.wiki` for example.  Check the main 
repo API first to see if the repo has wiki enabled.  You can see 
https://sourceforge.net/p/googlecodewikiimporter/git/ for reference as an 
example of another wiki importer.  It is a separate repo because it needs the 
"html2text" package to convert html to markdown, and that is a GPL library.

Github supports many markup types.  Find a full list and determine what the 
best way to convert them to markdown is.  My guess is that few formats will 
have tools available to convert them directly to markdown, so my likely 
recommendation would be to render them as HTML (using 
[pypeline](http://pypeline.sourceforge.net/) as a generic way to handle many of 
those formats) and then html2text to get it into markdown.

If html2text or any other GPL library is needed, this will have to be a 
separate repo from the main Allura repo.  So please evaluate & test the 
conversion options first, before putting code into place.

A second phase to all this (i.e. do it separately, after the basic import is 
all working) would be to handle revision history.  This would mean going 
through each commit in the wiki git repo, and converting & updating every file 
that changes.  This may be very time consuming, so when we get to it, we may 
want it to be a checkbox option, so users only do it if they want it.


---

Sent from sourceforge.net because allura-dev@incubator.apache.org is subscribed 
to https://sourceforge.net/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/allura/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.

Reply via email to