Re: index web

2009-03-22 Thread 陈琛
yes, you are right, the whole web has the two links.. but the web isnot created by me. If I have the opportunity, I will try thank you very much for the help, Really helped me a lot of busy:) 2009/3/20 yanky young yanky.yo...@gmail.com not really i guess any page in this website

Re: index web

2009-03-20 Thread 陈琛
thanks u can login in http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110from=ePortal_NewsDetail_FromHome and notice the upper right corner, have two translate , it can reach those two urls so i am worried . 2009/3/20 yanky young yanky.yo...@gmail.com that must work, but it seems

Re: index web

2009-03-20 Thread yanky young
I think my guess is right. I just see the code of that page. those two urls are generated by javascript function: function jump(lan) in this case, nutch might not be that smart to recognize this kind of generated url but if you generated this two links from server side, and then the urls in

Re: index web

2009-03-20 Thread yanky young
not really i guess any page in this website can have two links generated by javascript function, that's why nutch can't find that url because nutch will not click that link to trigger that js function as human does. I suggest that, you can generated those multilingual links in server side, for

Re: index web

2009-03-19 Thread yanky young
Hi: i guess the urls you mentioned are all directed to the same jsp or servlet, apparently they all begin with http://app02.laopdr.gov.la/ePortal/news/detail.actionhttp://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110from=ePortal_NewsDetail_FromHome. the difference is the request_locale

Re: index web

2009-03-19 Thread 陈琛
thanks.. the url is http://www.laopdr.gov.la/... depth 15 topN1200 ... seems must put

Re: index web

2009-03-19 Thread yanky young
that must work, but it seems weird. u know, from the seed url you given, nutch will crawl from the seed url and the whole crawled pages is actually a tree. the root node is the seed url. if u can not reach those two urls from the seed url by yourself, nutch can not too. yanky 2009/3/20 陈琛