This is very strange.
I didn't understand completely what happened when you used "-s 0", how many
URLs were indexed?
Try to use sorting by depth value: -o switch along with -s 0

Alexander.

----- Original Message -----
From: "Massimo Miccoli" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, March 16, 2001 6:50 PM
Subject: Re: [aseek-users] Aspseek limit


> The run_status_index is only a file with the output on index process
(index -r).
>
> The output of "index -S:
> ASPSeek database URL statistics
>
>     Status    Expired      Total
>    -----------------------------
>          0    2534940    2534940 Not indexed yet
>          1          0         13 Unknown status
>        200    2090805    5295002 OK
>        204          2          2 No content
>        299          5          5 Unknown status
>        300         63        104 Multiple Choices
>        301       5702      26726 Moved Permanently
>        302      16640      51540 Moved Temporarily
>        303          2          2 See Other
>        400         29         83 Bad Request
>        401       3407      19703 Unauthorized
>        403      15782      29490 Forbidden
>        404     133706     376890 Not found
>        406          1          2 Not Acceptable
>        407          2          4 Proxy Authentication Required
>        500        863       2313 Internal Server Error
>        502         33        254 Bad Gateway
>        503         85        110 Service Unavailable
>        504          1          3 Gateway Timeout
>    -----------------------------
>      Total    4802068    8337186
>
> "Alexander F. Avdonkin" ha scritto:
>
> > What is the command run_status_index  ?
> > Could you give me exact output of "index -S" ?
> > Number in the "total" file is usually less than generated by "index -S"
> > because "total" contains number of not empty URLs only.
> >
> > Alexander.
> >
> > ----- Original Message -----
> > From: "Massimo Miccoli" <[EMAIL PROTECTED]>
> > To: <[EMAIL PROTECTED]>
> > Sent: Friday, March 16, 2001 2:07 AM
> > Subject: Re: [aseek-users] Aspseek limit
> >
> > > Hi,
> > >
> > > I'm sorry,
> > > I send you more data about my problem.
> > > I've changed the period in the aspseek.conf from prevuis 25d to 7d. Is
a
> > problem?
> > > Ended thread: 10. Start:         0.000. End:
0.000-984668298.634.
> > > Duration:    0.000. URL: http://www.nouvellesfrontieres.it
> > > /robots.txt
> > > Ended thread: 12. Start:         0.000. End:
0.000-984668298.635.
> > > Duration:    0.000. URL: http://www.comunie.messina.it/rob
> > > ots.txt
> > > Saving real-time database ... done.
> > > Saving delta files
[..................................................]
> > done.
> > > Loading ranks
[..................................................]
> > done.
> > > Saving citation
[..................................................]
> > done.
> > > Calculating ranks
[..................................................]
> > done.
> > > In: 83185017. Out: 83185017. Rank: 3416857.449604
> > > Urls: 8306827. Hrefs: 83185017
> > > index process finished.
> > >
> > > Massimo Miccoli ha scritto:
> > >
> > > > Hi,
> > > > I'm sorry but the switch not work.
> > > > The result is:
> > > > Ended thread: 14. Start:         0.000. End:
> > 0.000-984668298.649.
> > > > Duration:    0.000. URL: http://adecco.it/robots.txt
> > > > Ended thread: 15. Start:         0.000. End:
> > 0.000-984668298.649.
> > > > Duration:    0.000. URL: http://www.javasoft-mirror.java.t
> > > > Saving real-time database ... done.
> > > > Saving delta files
[..................................................]
> > done.
> > > >
> > > > The command i've used is:
> > > > /usr/local/aspseek/sbin/index -N 16 -s 0 -f
> > /usr/local/aspseek/etc/url_index  -r
> > > > /usr/local/aspseek/etc/logs/run_status_index  -R 8
> > > >
> > > > In the ursl file are the urls with I started the first indexing.
> > > > The result of index -S
> > > > 8.300.345 urls
> > > > In total file: 5.200.231
> > > >
> > > > Thank
> > > >
> > > > Massimo
> > > >
> > > > "Alexander F. Avdonkin" ha scritto:
> > > >
> > > > > You can use command line switch "-s 0", that is index only
documents
> > which
> > > > > have not been indexed yet.
> > > > >
> > > > > Alexander.
> > > > >
> > > > > ----- Original Message -----
> > > > > From: "Massimo Miccoli" <[EMAIL PROTECTED]>
> > > > > To: <[EMAIL PROTECTED]>
> > > > > Sent: Wednesday, March 14, 2001 11:03 PM
> > > > > Subject: Re: [aseek-users] Aspseek limit
> > > > >
> > > > > > Hi,
> > > > > > The qestion is...
> > > > > > How can I index the rest of the urls the are in the statistics
> > result?
> > > > > > The command I've used:
> > > > > > /usr/local/aspseek/sbin/index -N 16 -f
> > > > > usr/local/aspseek/etc/url_index  -r
> > > > > > /usr/local/aspseek/etc/logs/run_status_index  -R 8 &
> > > > > >
> > > > > > My box have Linux kernel 2.4.2 and work fine.
> > > > > >
> > > > > > Thank
> > > > > >
> > > > > > Massimo
> > > > > >
> > > > > > "Alexander F. Avdonkin" ha scritto:
> > > > > >
> > > > > > > No, 5M URLs is approximate limit. With this number of URLs,
> > ASPseek
> > > > > requires
> > > > > > > about 700M of RAM to calculate ranks of pages.
> > > > > > > If number of URLs will grow, then swapping will occur during
ranks
> > > > > > > calculation.
> > > > > > >
> > > > > > > Alexander.
> > > > > > >
> > > > > > > ----- Original Message -----
> > > > > > > From: "Massimo Miccoli" <[EMAIL PROTECTED]>
> > > > > > > To: "aspseeklist" <[EMAIL PROTECTED]>
> > > > > > > Sent: Wednesday, March 14, 2001 3:12 AM
> > > > > > > Subject: [aseek-users] Aspseek limit
> > > > > > >
> > > > > > > > 5.000.000 of urls is an hard limit for Aspessek?
> > > > > > > > How may page can I index on a Linux box dual Pentium III 900
and
> > one
> > > > > GB
> > > > > > > > Ram and 132 GB disk?
> > > > > > > > I've see in index statistics (index -S) that indexed page is
> > 5.209.600
> > > > > > > > and not index 8.300.334.
> > > > > > > > So, i re-run the index again (index -N 16 -f /urlfile -R 8)
and
> > at the
> > > > > > > > end the page indexed is the same.
> > > > > > > > The first time I've run index I never stoped it, the work is
> > finish
> > > > > > > > normal at the end urls list and the urls discovered.
> > > > > > > >
> > > > > > > > Thank for response,
> > > > > > > >
> > > > > > > > Massimo
> > > > > > > >
> > > > > >
> > >
>

Reply via email to