No your doing nothing incorrectly. I get pretty dismal results both with basic-crawler within Any23 please see below
lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local$ any23 rover http://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=meaning [1] 2956 [2] 2957 [3] 2958 lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local$ ------------------------------------------------------------------------ Apache Any23 :: rover ------------------------------------------------------------------------ @prefix dcterms: <http://purl.org/dc/terms/> . <http://en.wikipedia.org/w/api.php?action=query> dcterms:title "MediaWiki API Result" . ------------------------------------------------------------------------ Apache Any23 SUCCESS Total time: 2s Finished at: Thu Jun 21 22:53:27 BST 2012 Final Memory: 24M/483M ------------------------------------------------------------ [1] Done any23 rover http://en.wikipedia.org/w/api.php?action=query [2]- Done list=search [3]+ Done srwhat=text The problem is that I don't know how crawler4j deals with some characters such as '?' within URL strings. and whether it treats them as queries or not? By the looks of the log output above, the URL string is being treated incorrectly. Sitting above all of this is the fact that I don't think the wiki markup syntax is not supported within Any23 parser implementations. Lewis On Thu, Jun 21, 2012 at 10:29 PM, armon <[email protected]> wrote: > and even when I copy the xml part of data in the url as the input content, > it still can't work well, but when I try a rdf file, it works well, is > there anything I do incorrectly? > > > 2012/6/22 armon <[email protected]> > >> Hi Lewis, thanks very much for your reply, I am sorry to interrupt you so >> late, >> >> the url I used was: >> >> >> http://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=meaning >> >> >> and then I used command: ./any23 rover url(showed above) to run the >> result. >> >> thanks. >> >> armon >> >> >> >> >> >> >> 2012/6/22 Lewis John Mcgibbney <[email protected]> >> >>> Hi Armon, >>> >>> On Thu, Jun 21, 2012 at 4:15 PM, armon <[email protected]> wrote: >>> > Hi, >>> > I do some data transform currently from xml-format wiki data >>> >>> Can you give a small example of this xml? >>> >>> > (retrieved by wikipedia API) to turtle, >>> >>> Also a small example of your turtle >>> >>> > but it seems that the any23 can't >>> > work correctly. (I used the command: ./any23 rover url ) >>> >>> What do you get to std out? I am easily able to use any23 parsers on >>> fetching structure from wikipedia pages... but this is not what you >>> are referring to... I need some more information from you please. >>> >>> > >>> > Does any23 actually support the xml data retrieved by wikipedia >>> API >>> > as the input format ? >>> >>> Please see above >>> >>> >>> >>> >>> >>> -- >>> Lewis >>> >> >> -- Lewis
