yes, you are right, the whole web has the two links.. but the web isnot created by me..... If I have the opportunity, I will try
thank you very much for the help, Really helped me a lot of busy....:) 2009/3/20 yanky young <yanky.yo...@gmail.com> > not really > > i guess any page in this website can have two links generated by javascript > function, that's why nutch can't find that url because nutch will not click > that link to trigger that js function as human does. > > I suggest that, you can generated those multilingual links in server side, > for example, in jsp, then in web pages you can get stataic links that can > be > found by nutch. > > for example, now in your jsp page, those two links are like this: > > <a href="javascript:jump('en')">English</a> > <a href="javascript:jump('la')">La</a> > > these two links can not be found by nutch, so u can change your jsp like > this: > <% > String pageUrl = request.getRequestURI(); > String enUrl = pageUrl + "&request_locale=en"; > String laUrl = pageUrl + "&request_locale=la"; > %> > <a href="<%=enUrl%>">English</a> > <a href="<%=laUrl%>">La</a> > > then u get static urls in your pages when u browse > > good luck > > yanky > > 2009/3/20 陈琛 <kylin.chc...@gmail.com> > > > thanks very much!!! > > > > > > in other words, now i only put > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome > > and > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome > > in the url.txt? > > > > > > 2009/3/20 yanky young <yanky.yo...@gmail.com> > > > > > I think my guess is right. I just see the code of that page. > > > > > > those two urls are generated by javascript function: > > > > > > function jump(lan) > > > > > > in this case, nutch might not be that smart to recognize this kind of > > > generated url > > > > > > but if you generated this two links from server side, and then the > > > urls in web pages is static link, then nutch > > > > > > can crawl as usual. > > > > > > good luck > > > > > > yanky > > > > > > > > > 2009/3/20 陈琛 <kylin.chc...@gmail.com> > > > > > > > thanks > > > > > > > > u can login in > > > > > > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome > > > > > > > > and notice the upper right corner, have two translate , it can reach > > > those > > > > two urls > > > > > > > > so i am worried . > > > > 2009/3/20 yanky young <yanky.yo...@gmail.com> > > > > > > > > > that must work, but it seems weird. u know, from the seed url you > > > given, > > > > > nutch will crawl from the seed url and the whole crawled pages is > > > > actually > > > > > a > > > > > tree. the root node is the seed url. if u can not reach those two > > urls > > > > from > > > > > the seed url by yourself, nutch can not too. > > > > > > > > > > yanky > > > > > > > > > > > > > > > 2009/3/20 陈琛 <kylin.chc...@gmail.com> > > > > > > > > > > > thanks.. > > > > > > the url is http://www.laopdr.gov.la/... > > > > > > depth 15 topN1200 ... > > > > > > > > > > > > seems must put > > > > > > > > > > > > > > > > > > > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A& > > > > > > < > > > > > > > > > > > > > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A > > > > > > > > > > > > > in > > > > > > the urls directory > > > > > > > > > > > > > > > > > > > > > > > > 2009/3/19 yanky young <yanky.yo...@gmail.com> > > > > > > > > > > > > > Hi: > > > > > > > > > > > > > > i guess the urls you mentioned are all directed to the same jsp > > or > > > > > > servlet, > > > > > > > apparently they all begin with > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action< > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome > > > > > > > >. > > > > > > > the difference is the request_locale parameter. I have no idea > > how > > > > > these > > > > > > > two > > > > > > > urls with different request_locale parameters are generated, > but > > I > > > > > guess > > > > > > > nutch just don't know this request_locale parameters because > this > > > > > > parameter > > > > > > > may be added by javascript or backend content management > system. > > > > Maybe > > > > > u > > > > > > > can > > > > > > > write these links in a page that can be crawled by nutch. The > > point > > > > is > > > > > > that > > > > > > > these links must can be found somewhere in your whole website > > > pages. > > > > if > > > > > > > not, > > > > > > > they can not be found by nutch. > > > > > > > > > > > > > > good luck > > > > > > > > > > > > > > yanky > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2009/3/19 陈琛 <kylin.chc...@gmail.com> > > > > > > > > > > > > > > > please help me, it is Urgent and Important, thanks > > > > > > > > > > > > > > > > ---------- Forwarded message ---------- > > > > > > > > From: 陈琛 <kylin.chc...@gmail.com> > > > > > > > > Date: 2009/3/19 > > > > > > > > Subject: index web > > > > > > > > To: nutch-user@lucene.apache.org > > > > > > > > > > > > > > > > > > > > > > > > hi, all: > > > > > > > > > > > > > > > > i can get index url like > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome > > > > > > > > > > > > > > > > but cannot get index like > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome > > > > > > > > &< > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome%0A& > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome > > > > > > > > &< > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A& > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > why not index ? > > > > > > > > the web have any different? > > > > > > > > > > > > > > > > please notice "request_locale=" > > > > > > > > > > > > > > > > > > > > > > > > thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >