This is very strange.
I didn't understand completely what happened when you used "-s 0", how many
URLs were indexed?
Try to use sorting by depth value: -o switch along with -s 0
Alexander.
----- Original Message -----
From: "Massimo Miccoli" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, March 16, 2001 6:50 PM
Subject: Re: [aseek-users] Aspseek limit
> The run_status_index is only a file with the output on index process
(index -r).
>
> The output of "index -S:
> ASPSeek database URL statistics
>
> Status Expired Total
> -----------------------------
> 0 2534940 2534940 Not indexed yet
> 1 0 13 Unknown status
> 200 2090805 5295002 OK
> 204 2 2 No content
> 299 5 5 Unknown status
> 300 63 104 Multiple Choices
> 301 5702 26726 Moved Permanently
> 302 16640 51540 Moved Temporarily
> 303 2 2 See Other
> 400 29 83 Bad Request
> 401 3407 19703 Unauthorized
> 403 15782 29490 Forbidden
> 404 133706 376890 Not found
> 406 1 2 Not Acceptable
> 407 2 4 Proxy Authentication Required
> 500 863 2313 Internal Server Error
> 502 33 254 Bad Gateway
> 503 85 110 Service Unavailable
> 504 1 3 Gateway Timeout
> -----------------------------
> Total 4802068 8337186
>
> "Alexander F. Avdonkin" ha scritto:
>
> > What is the command run_status_index ?
> > Could you give me exact output of "index -S" ?
> > Number in the "total" file is usually less than generated by "index -S"
> > because "total" contains number of not empty URLs only.
> >
> > Alexander.
> >
> > ----- Original Message -----
> > From: "Massimo Miccoli" <[EMAIL PROTECTED]>
> > To: <[EMAIL PROTECTED]>
> > Sent: Friday, March 16, 2001 2:07 AM
> > Subject: Re: [aseek-users] Aspseek limit
> >
> > > Hi,
> > >
> > > I'm sorry,
> > > I send you more data about my problem.
> > > I've changed the period in the aspseek.conf from prevuis 25d to 7d. Is
a
> > problem?
> > > Ended thread: 10. Start: 0.000. End:
0.000-984668298.634.
> > > Duration: 0.000. URL: http://www.nouvellesfrontieres.it
> > > /robots.txt
> > > Ended thread: 12. Start: 0.000. End:
0.000-984668298.635.
> > > Duration: 0.000. URL: http://www.comunie.messina.it/rob
> > > ots.txt
> > > Saving real-time database ... done.
> > > Saving delta files
[..................................................]
> > done.
> > > Loading ranks
[..................................................]
> > done.
> > > Saving citation
[..................................................]
> > done.
> > > Calculating ranks
[..................................................]
> > done.
> > > In: 83185017. Out: 83185017. Rank: 3416857.449604
> > > Urls: 8306827. Hrefs: 83185017
> > > index process finished.
> > >
> > > Massimo Miccoli ha scritto:
> > >
> > > > Hi,
> > > > I'm sorry but the switch not work.
> > > > The result is:
> > > > Ended thread: 14. Start: 0.000. End:
> > 0.000-984668298.649.
> > > > Duration: 0.000. URL: http://adecco.it/robots.txt
> > > > Ended thread: 15. Start: 0.000. End:
> > 0.000-984668298.649.
> > > > Duration: 0.000. URL: http://www.javasoft-mirror.java.t
> > > > Saving real-time database ... done.
> > > > Saving delta files
[..................................................]
> > done.
> > > >
> > > > The command i've used is:
> > > > /usr/local/aspseek/sbin/index -N 16 -s 0 -f
> > /usr/local/aspseek/etc/url_index -r
> > > > /usr/local/aspseek/etc/logs/run_status_index -R 8
> > > >
> > > > In the ursl file are the urls with I started the first indexing.
> > > > The result of index -S
> > > > 8.300.345 urls
> > > > In total file: 5.200.231
> > > >
> > > > Thank
> > > >
> > > > Massimo
> > > >
> > > > "Alexander F. Avdonkin" ha scritto:
> > > >
> > > > > You can use command line switch "-s 0", that is index only
documents
> > which
> > > > > have not been indexed yet.
> > > > >
> > > > > Alexander.
> > > > >
> > > > > ----- Original Message -----
> > > > > From: "Massimo Miccoli" <[EMAIL PROTECTED]>
> > > > > To: <[EMAIL PROTECTED]>
> > > > > Sent: Wednesday, March 14, 2001 11:03 PM
> > > > > Subject: Re: [aseek-users] Aspseek limit
> > > > >
> > > > > > Hi,
> > > > > > The qestion is...
> > > > > > How can I index the rest of the urls the are in the statistics
> > result?
> > > > > > The command I've used:
> > > > > > /usr/local/aspseek/sbin/index -N 16 -f
> > > > > usr/local/aspseek/etc/url_index -r
> > > > > > /usr/local/aspseek/etc/logs/run_status_index -R 8 &
> > > > > >
> > > > > > My box have Linux kernel 2.4.2 and work fine.
> > > > > >
> > > > > > Thank
> > > > > >
> > > > > > Massimo
> > > > > >
> > > > > > "Alexander F. Avdonkin" ha scritto:
> > > > > >
> > > > > > > No, 5M URLs is approximate limit. With this number of URLs,
> > ASPseek
> > > > > requires
> > > > > > > about 700M of RAM to calculate ranks of pages.
> > > > > > > If number of URLs will grow, then swapping will occur during
ranks
> > > > > > > calculation.
> > > > > > >
> > > > > > > Alexander.
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > From: "Massimo Miccoli" <[EMAIL PROTECTED]>
> > > > > > > To: "aspseeklist" <[EMAIL PROTECTED]>
> > > > > > > Sent: Wednesday, March 14, 2001 3:12 AM
> > > > > > > Subject: [aseek-users] Aspseek limit
> > > > > > >
> > > > > > > > 5.000.000 of urls is an hard limit for Aspessek?
> > > > > > > > How may page can I index on a Linux box dual Pentium III 900
and
> > one
> > > > > GB
> > > > > > > > Ram and 132 GB disk?
> > > > > > > > I've see in index statistics (index -S) that indexed page is
> > 5.209.600
> > > > > > > > and not index 8.300.334.
> > > > > > > > So, i re-run the index again (index -N 16 -f /urlfile -R 8)
and
> > at the
> > > > > > > > end the page indexed is the same.
> > > > > > > > The first time I've run index I never stoped it, the work is
> > finish
> > > > > > > > normal at the end urls list and the urls discovered.
> > > > > > > >
> > > > > > > > Thank for response,
> > > > > > > >
> > > > > > > > Massimo
> > > > > > > >
> > > > > >
> > >
>