[ 
https://issues.apache.org/jira/browse/CONNECTORS-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127531#comment-13127531
 ] 

Karl Wright commented on CONNECTORS-256:
----------------------------------------

Can you be as specific as possible as to what you expected to see in the 
content and what you got?  Also, can you send your browser to your mediawiki 
with a URL of the form:

http://xxxx/api.php?format=xml&action=query&prop=revisions&pageids=<your_page>&rvprop=user%7ccomment%7ccontent

The content should be included in the XML response.  Can you compare what you 
see vs. what you EXPECT to see?


                
> Connector for crawling Wikis
> ----------------------------
>
>                 Key: CONNECTORS-256
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-256
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Wiki connector
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.4
>
>
> People have been trying to crawl wikis with ManifoldCF, but using the generic 
> crawler is not a good way to do this.  Instead, it looks like we really could 
> use a wiki connector, which would understand the wiki API and thus crawl wiki 
> content quickly and effectively.
> Some pertinent API references follow:
> I don't know if it is possible to link to a wiki document with just the 
> pageid, but it is possible to to get the url for the referring pageid via api:
> http://en.wikipedia.org/w/api.php?action=query&prop=info&pageids=27697087&inprop=url
> It is possible to get the metadata of a document using the pages id (instead 
> of title) directly:
> Titel -> 
> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API&rvprop=timestamp|user|comment|content
> PageID -> 
> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&pageids=27697087&rvprop=timestamp|user|comment|content
> - There needs to be some notion of an overall list of pages:
>        - http://www.mediawiki.org/wiki/API:Allpages
>        - Example: 
> http://en.wikipedia.org/w/api.php?action=query&list=allpages&apfrom=Kre&aplimit=5
> - Metadata information (author and pub date) also needs to be separated out 
> in some way:
>        - http://www.mediawiki.org/wiki/API:Properties#Revisions:_Example
>        - Example:  
> http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=API|Main%20Page&rvprop=timestamp|user|comment|content

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to