look at org.apache.lucene.index.IndexReader.numDocs() method. You can write a simple utility to run it in the shell.
On 7/28/07, Enzo Michelangeli <[EMAIL PROTECTED]> wrote: > Is there a quick way of knowing how many pages are indexed (_not_ how many > are referenced in crawldb as fetched URL's)? I could use Luke to peek inside > the indexes and get the "Number of documents", but they are located on a > remote headless server with only SSH access... (OK, I actually did access > them using Sftpdrive, but I'd like to have a command line to invoke in a > shell script...) > > Enzo > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general