yep,so how to solve it, BTW, it still can't work while I save the xml part of the data in http://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=meaning , the xml file is in the attachment file.
armon On 2012年6月22日星期五 at 上午5:59, Lewis John Mcgibbney wrote: > No your doing nothing incorrectly. I get pretty dismal results both > with basic-crawler within Any23 please see below > > lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local$ any23 > rover > http://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=meaning > [1] 2956 > [2] 2957 > [3] 2958 > lewismc@lewismc-HP-Mini-110-3100:~/ASF/trunk/runtime/local$ > ------------------------------------------------------------------------ > Apache Any23 :: rover > ------------------------------------------------------------------------ > > @prefix dcterms: <http://purl.org/dc/terms/> . > > <http://en.wikipedia.org/w/api.php?action=query> dcterms:title > "MediaWiki API Result" . > > ------------------------------------------------------------------------ > Apache Any23 SUCCESS > Total time: 2s > Finished at: Thu Jun 21 22:53:27 BST 2012 > Final Memory: 24M/483M > ------------------------------------------------------------ > [1] Done any23 rover > http://en.wikipedia.org/w/api.php?action=query > [2]- Done list=search > [3]+ Done srwhat=text > > The problem is that I don't know how crawler4j deals with some > characters such as '?' within URL strings. and whether it treats them > as queries or not? By the looks of the log output above, the URL > string is being treated incorrectly. > > Sitting above all of this is the fact that I don't think the wiki > markup syntax is not supported within Any23 parser implementations. > > Lewis > > > On Thu, Jun 21, 2012 at 10:29 PM, armon <[email protected] > (mailto:[email protected])> wrote: > > and even when I copy the xml part of data in the url as the input content, > > it still can't work well, but when I try a rdf file, it works well, is > > there anything I do incorrectly? > > > > > > 2012/6/22 armon <[email protected] (mailto:[email protected])> > > > > > Hi Lewis, thanks very much for your reply, I am sorry to interrupt you so > > > late, > > > > > > the url I used was: > > > > > > > > > http://en.wikipedia.org/w/api.php?action=query&list=search&srwhat=text&srsearch=meaning > > > > > > > > > and then I used command: ./any23 rover url(showed above) to run the > > > result. > > > > > > thanks. > > > > > > armon > > > > > > > > > > > > > > > > > > > > > 2012/6/22 Lewis John Mcgibbney <[email protected] > > > (mailto:[email protected])> > > > > > > > Hi Armon, > > > > > > > > On Thu, Jun 21, 2012 at 4:15 PM, armon <[email protected] > > > > (mailto:[email protected])> wrote: > > > > > Hi, > > > > > I do some data transform currently from xml-format wiki data > > > > > > > > Can you give a small example of this xml? > > > > > > > > > (retrieved by wikipedia API) to turtle, > > > > > > > > Also a small example of your turtle > > > > > > > > > but it seems that the any23 can't > > > > > work correctly. (I used the command: ./any23 rover url ) > > > > > > > > What do you get to std out? I am easily able to use any23 parsers on > > > > fetching structure from wikipedia pages... but this is not what you > > > > are referring to... I need some more information from you please. > > > > > > > > > > > > > > Does any23 actually support the xml data retrieved by wikipedia > > > > API > > > > > as the input format ? > > > > > > > > Please see above > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Lewis > > > > -- > Lewis
