yes, you are right, the whole web has the two links..

but the web isnot created by me..... If I have the opportunity, I will try

 thank you very much for the help, Really helped me a lot of busy....:)


2009/3/20 yanky young <yanky.yo...@gmail.com>

> not really
>
> i guess any page in this website can have two links generated by javascript
> function, that's why nutch can't find that url because nutch will not click
> that link to trigger that js function as human does.
>
> I suggest that, you can generated those multilingual links in server side,
> for example, in jsp, then in web pages you can get stataic links that can
> be
> found by nutch.
>
> for example, now in your jsp page, those two links are like this:
>
> <a href="javascript:jump('en')">English</a>
> <a href="javascript:jump('la')">La</a>
>
> these two links can not be found by nutch, so u can change your jsp like
> this:
> <%
> String pageUrl = request.getRequestURI();
> String enUrl = pageUrl + "&request_locale=en";
> String laUrl = pageUrl + "&request_locale=la";
> %>
> <a href="<%=enUrl%>">English</a>
> <a href="<%=laUrl%>">La</a>
>
> then u get static urls in your pages when u browse
>
> good luck
>
> yanky
>
> 2009/3/20 陈琛 <kylin.chc...@gmail.com>
>
> > thanks very much!!!
> >
> >
> > in other words, now i only put
> >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome
> > and
> >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome
> > in the url.txt?
> >
> >
> > 2009/3/20 yanky young <yanky.yo...@gmail.com>
> >
> > > I think my guess is right. I just see the code of that page.
> > >
> > > those two urls are generated by javascript function:
> > >
> > > function jump(lan)
> > >
> > > in this case, nutch might not be that smart to recognize this kind of
> > > generated url
> > >
> > > but if you generated this two links from server side, and then the
> > > urls in web pages is static link, then nutch
> > >
> > > can crawl as usual.
> > >
> > > good luck
> > >
> > > yanky
> > >
> > >
> > > 2009/3/20 陈琛 <kylin.chc...@gmail.com>
> > >
> > > > thanks
> > > >
> > > > u can login in
> > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome
> > > >
> > > > and notice the upper right corner, have two translate , it can reach
> > > those
> > > > two urls
> > > >
> > > > so i am worried .
> > > > 2009/3/20 yanky young <yanky.yo...@gmail.com>
> > > >
> > > > > that must work, but it seems weird. u know, from the seed url you
> > > given,
> > > > > nutch will crawl from the seed url and the whole crawled pages is
> > > > actually
> > > > > a
> > > > > tree. the root node is the seed url. if u can not reach those two
> > urls
> > > > from
> > > > > the seed url by yourself, nutch can not too.
> > > > >
> > > > > yanky
> > > > >
> > > > >
> > > > > 2009/3/20 陈琛 <kylin.chc...@gmail.com>
> > > > >
> > > > > > thanks..
> > > > > >               the url is http://www.laopdr.gov.la/...
> > > > > > depth 15 topN1200 ...
> > > > > >
> > > > > > seems must put
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A&;
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A
> > > > > > >
> > > > > > in
> > > > > > the urls directory
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2009/3/19 yanky young <yanky.yo...@gmail.com>
> > > > > >
> > > > > > > Hi:
> > > > > > >
> > > > > > > i guess the urls you mentioned are all directed to the same jsp
> > or
> > > > > > servlet,
> > > > > > > apparently they all begin with
> > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action<
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome
> > > > > > > >.
> > > > > > > the difference is the request_locale parameter. I have no idea
> > how
> > > > > these
> > > > > > > two
> > > > > > > urls with different request_locale parameters are generated,
> but
> > I
> > > > > guess
> > > > > > > nutch just don't know this request_locale parameters because
> this
> > > > > > parameter
> > > > > > > may be added by javascript or backend content management
> system.
> > > > Maybe
> > > > > u
> > > > > > > can
> > > > > > > write these links in a page that can be crawled by nutch. The
> > point
> > > > is
> > > > > > that
> > > > > > > these links must can be found somewhere in your whole website
> > > pages.
> > > > if
> > > > > > > not,
> > > > > > > they can not be found by nutch.
> > > > > > >
> > > > > > > good luck
> > > > > > >
> > > > > > > yanky
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2009/3/19 陈琛 <kylin.chc...@gmail.com>
> > > > > > >
> > > > > > > > please help me, it is Urgent and Important, thanks
> > > > > > > >
> > > > > > > > ---------- Forwarded message ----------
> > > > > > > > From: 陈琛 <kylin.chc...@gmail.com>
> > > > > > > > Date: 2009/3/19
> > > > > > > > Subject: index web
> > > > > > > > To: nutch-user@lucene.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > > > hi, all:
> > > > > > > >
> > > > > > > > i can get index url like
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome
> > > > > > > >
> > > > > > > > but  cannot get index like
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome
> > > > > > > > &<
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome%0A&;
> > > > > > > >
> > > > > > > > and
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome
> > > > > > > > &<
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A&;
> > > > > > > >
> > > > > > >  >
> > > > > > > >
> > > > > > > > why not index ?
> > > > > > > > the web have any different?
> > > > > > > >
> > > > > > > > please notice "request_locale="
> > > > > > > >
> > > > > > > >
> > > > > > > > thanks
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to