[
https://issues.apache.org/jira/browse/CONNECTORS-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright resolved CONNECTORS-256.
------------------------------------
Resolution: Fixed
r1182834.
> Connector for crawling Wikis
> ----------------------------
>
> Key: CONNECTORS-256
> URL: https://issues.apache.org/jira/browse/CONNECTORS-256
> Project: ManifoldCF
> Issue Type: New Feature
> Components: Wiki connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 0.4
>
>
> People have been trying to crawl wikis with ManifoldCF, but using the generic
> crawler is not a good way to do this. Instead, it looks like we really could
> use a wiki connector, which would understand the wiki API and thus crawl wiki
> content quickly and effectively.
> Some pertinent API references follow:
> I don't know if it is possible to link to a wiki document with just the
> pageid, but it is possible to to get the url for the referring pageid via api:
> http://en.wikipedia.org/w/api.php?action=query&prop=info&pageids=27697087&inprop=url
> It is possible to get the metadata of a document using the pages id (instead
> of title) directly:
> Titel ->
> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API&rvprop=timestamp|user|comment|content
> PageID ->
> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&pageids=27697087&rvprop=timestamp|user|comment|content
> - There needs to be some notion of an overall list of pages:
> - http://www.mediawiki.org/wiki/API:Allpages
> - Example:
> http://en.wikipedia.org/w/api.php?action=query&list=allpages&apfrom=Kre&aplimit=5
> - Metadata information (author and pub date) also needs to be separated out
> in some way:
> - http://www.mediawiki.org/wiki/API:Properties#Revisions:_Example
> - Example:
> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main%20Page&rvprop=timestamp|user|comment|content
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira