Hello, Alexander,
I tried twice to clean the database and start indexing anew. The same
result.
When I comment out the Converter statements
Converter application/pdf text/plain /usr/bin/pdftotext -q $in $out
Converter application/postscript text/plain /usr/local/bin/pstotext
the indexing goes all right.
When indexing with the converters, the abnormally large files are
657M 00w
544M 01w
630M 02w
623M 03w
632M 04w
641M 05w
659M 06w
651M 07w
595M 08w
637M 09w
653M 10w
590M 11w
608M 12w
657M 13w
621M 14w
642M 15w
327M 16w
Now I am trying to find out if the problem lies with .pdf or .ps files.
What is the format of the files in 00w-99w directories? Is it described
somewhere?
Regards,
Gregory
-----Original Message-----
From: Alexander F Avdonkin [mailto:[EMAIL PROTECTED]]
Sent: Samstag, 6. Juli 2002 11:47
To: [EMAIL PROTECTED]
Subject: Re: [aseek-users]
Possibly it could happen due to corrupted delta files. See which files
occupies
the most of space inside those directories.
The only solution here is to reindex everything from clear DB.
Alexander.
Gregory Kozlovsky wrote:
> Hello, ASPseekers,
>
> I install aspseek-1.2.9 and started indexing into an empty database.
> However,
> the indexing stopped when 95390 docs were indexed and 352506 were found
> and not indexed. The reason is that the /var/aspseek/dbname became huge
and
> filled all the available space. With the old version, this directory had
5.2
> G for
> about 2 million indexed docs, now it is 14 G. Here is the output of "du *"
> inside the
> directory:
>
> [root@isn-search]# du *
> 657M 00w
> 544M 01w
> 630M 02w
> 623M 03w
> 632M 04w
> 641M 05w
> 659M 06w
> 651M 07w
> 595M 08w
> 637M 09w
> 653M 10w
> 590M 11w
> 608M 12w
> 657M 13w
> 621M 14w
> 642M 15w
> 327M 16w
> 39M 17w
> 41M 18w
> 43M 19w
> 43M 20w
> 37M 21w
>
> The rest of the subdirectories are normal size, around 50M. What is going
> wrong? One more thing that is suspicious is that I started indexing .pdf
> and .ps documents. May be the converters give some junk words? What
> converters do you people use?
>
> Gregory Kozlovsky
>
> Project Manager for Information Systems Tel: +41 (0)1 632
63
> 70
> International Relations and Security Network (ISN) Fax: +41 (0)1 632
14
> 13
> Center for Security Studies and Conflict Research Email:
> [EMAIL PROTECTED]
> Swiss Federal Institute of Technology (ETH) http://www.isn.ch
> Leonhardshalde 21, ETH-Zentrum / LEH
> CH-8092 Z�rich, Switzerland