There was a recent discussion on the postgres PERFORMENCE mailing list (check out their archive) about using over 4GM of ram on linux. AFAIR the gist of it was that while a single process can't use more than 4G, the kernel can and *will* use the additional memory for caching.
So I'd recommend that you just use the 1.5GB for Lucene, run some kind of automated tests, run vmstat/iostat and see if your disks are getting hit. That should be much less complicated that RAMDirectories, spliting JVMs, etc. Regards, Dror On Fri, Oct 31, 2003 at 10:43:08AM -0500, Philippe Laflamme wrote: > Having more RAM does not necessarily mean you can use it in your > process. Keep in mind that a Xeon is a 32 bit x86 architecture, hence > can only physically address 4GB of RAM. > > This means that theoretically a process cannot access more than 4GB (all > of them accounted for can add up to 16GB and more due to swap). > > I say theoretically because there are other limitations. Last I heard, > Linux can only access 2GB of RAM per process. This is even worse for a > Java VM since it needs to allocate its memory in one chunk. > > This Java VM limitation makes the available memory even lower: look at > /proc/<pid>/maps of a Java process, you'll see that dynamic libraries > create holes in the memory map that a VM cannot access. A Java VM on > Linux is limited to approximatly 1.5GB. Try it: java -Xmx2000m will fail > on a 32 bit Linux machine. There might be some kernel pathes that I am > not aware of though. > > If your 6M docs take up more than what you can access in one process, > you'll have to split up your processing into multiple VMs. Each VM could > load a RAMDirectory index that fits into the limitations. Then you'd > have another process that could access these "distributed" indexes. > > Yes 16GB of RAM is defenitly fun... on a 64 bit architecture. > > Phil > > On Fri, 2003-10-31 at 08:23, Otis Gospodnetic wrote: > > Wow, with 16GB RAM, I would definitely load the index into RAM. You > > can use RAMDirectory(Directory) constructor for that. > > > > As for RAMDrives..... I have no experience with those, but I have heard > > of some people using ramfs under Linux. Ramfs is a memory based > > filesystem. Mount it and you have it. Unmount it and it is gone. > > > > Otis > > > > > > --- jt oob <[EMAIL PROTECTED]> wrote: > > > Hi, > > > I am currently indexing around 6 million text > > > documents using lucene. > > > > > > We have a new server arriving in the next few weeks > > > which the queries will be run on. With the following > > > stats: Dell 6650 - 4 x Xeon HT CPU's, 16 GB RAM, 36GB > > > SCSI Ultra160 Hdd. (connected to 1.5TB IDE RAID with > > > actual source documents) > > > > > > What is the best strategy for fast searches? > > > Do I need to write a server which holds the indexes in > > > a RAMDirectory? > > > > > > When should RAMDirectories generally be used? I have > > > read several articles saying that RAMDrives under > > > linux > > > are rarely a good idea, but am not sure on how to > > > interpret this in the context of lucene and > > > RAMDirectories. > > > > > > I have looked over the documentation I have found on > > > the lucene web site - hope i haven't missed something. > > > > > > Is there a general guide to building large search > > > engines with lucene? I am very new to the whole field. > > > > > > Thanks for the great software! > > > jt > > > > > > __________________________________ > > Do you Yahoo!? > > Exclusive Video Premiere - Britney Spears > > http://launch.yahoo.com/promos/britneyspears/ > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
