How are you documents named? is it alphabetical or numerical, Mine where numerical so I I creates n directories like so
11 , 12, 13, 14, .... 19, 21 , 22 , 23 .. ........ 99 you get the idea and I stored the files into the directories that each belonged to depending on the last two numbers in the file name (you could use file size to shuffle the files around as well (ie, use the 2 rightmost numbers in the file size in bytes) so at this point you'll have shuffled your million docs into 100 directories and then Lucene can spider through each set of directories indexing let's say 5000 files at a time and then deleting them or moving them into another location, it you get 100 million files simply up the precision on the directory to a 3 digit setup or a 4 digit setup (once you automate it, sky's the limit)


Hope this helps

Nader Henein


Michael Wechner wrote:

I try to index around a million documents. The problem is
that I run out of memory during sorting by uid when I go through
the directory recursively.

Well, I could add more memory, but this wouldn't really solve my problem,
because at some point I will always run out of memory (e.g. 10 million documents).


Is there another approach than sorting by uid?

Thanks

Michi


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to