Hello, Alexander,

I tried twice to clean the database and start indexing anew. The same
result.
When I comment out the Converter statements

    Converter application/pdf text/plain /usr/bin/pdftotext -q $in $out
    Converter application/postscript text/plain /usr/local/bin/pstotext

the indexing goes all right.

When indexing with the converters, the abnormally large files are 
 657M    00w
 544M    01w
 630M    02w
 623M    03w
 632M    04w
 641M    05w
 659M    06w
 651M    07w
 595M    08w
 637M    09w
 653M    10w
 590M    11w
 608M    12w
 657M    13w
 621M    14w
 642M    15w
 327M    16w

Now I am trying to find out if the problem lies with .pdf or .ps files.

What is the format of the files in 00w-99w directories? Is it described
somewhere?

    Regards,

        Gregory


-----Original Message-----
From: Alexander F Avdonkin [mailto:[EMAIL PROTECTED]]
Sent: Samstag, 6. Juli 2002 11:47
To: [EMAIL PROTECTED]
Subject: Re: [aseek-users]


Possibly it could happen due to corrupted delta files. See which files
occupies
the most of space inside those directories.
The only solution here is to reindex everything from clear DB.

Alexander.


Gregory Kozlovsky wrote:

> Hello, ASPseekers,
>
> I install aspseek-1.2.9 and started indexing into an empty database.
> However,
> the indexing stopped when 95390 docs were indexed and 352506 were found
> and not indexed. The reason is that the /var/aspseek/dbname became huge
and
> filled all the available space. With the old version, this directory had
5.2
> G for
> about 2 million indexed docs, now it is 14 G. Here is the output of "du *"
> inside the
> directory:
>
> [root@isn-search]# du *
> 657M    00w
> 544M    01w
> 630M    02w
> 623M    03w
> 632M    04w
> 641M    05w
> 659M    06w
> 651M    07w
> 595M    08w
> 637M    09w
> 653M    10w
> 590M    11w
> 608M    12w
> 657M    13w
> 621M    14w
> 642M    15w
> 327M    16w
> 39M     17w
> 41M     18w
> 43M     19w
> 43M     20w
> 37M     21w
>
> The rest of the subdirectories are normal size, around 50M. What is going
> wrong? One more thing that is suspicious is that I started indexing .pdf
> and .ps documents. May be the converters give some junk words? What
> converters do you people use?
>
>         Gregory Kozlovsky
>
> Project Manager for Information Systems                 Tel: +41 (0)1 632
63
> 70
> International Relations and Security Network (ISN)      Fax: +41 (0)1 632
14
> 13
> Center for Security Studies and Conflict Research       Email:
> [EMAIL PROTECTED]
> Swiss Federal Institute of Technology (ETH)             http://www.isn.ch
> Leonhardshalde 21, ETH-Zentrum / LEH
> CH-8092 Z�rich, Switzerland

Reply via email to