Thanks Jian I need to retrive the original document sometimes. I did not quite understand your second suggestion. Can you please help me understand better, a pointer to some web resource will also help.
jian chen <[EMAIL PROTECTED]> wrote: Hi, Depending on the operating system, there might be a hard limit on the number of files in one directory (windoze versions). Even with operating systems that don't have a hard limit, it is still better not to put too many files in one directory (linux). Typically, the file system won't be very efficient in terms of file retrieval if there are nore than couple thousand files in one directory. There are some ways to tackle this issue. 1) Use a hash function to distribute the files to different sub directories based on the file name. For example, use the MD5 algorithm in Java or CRC algorithm in java, hash the file name to a number, use this number to construct directory. For example, if the number you hashed is 123456, then, you can make 123 as a sub-dir name, and 456 as the sub-sub dir name, so forth. I think the SQUID web proxy server uses this approach to do the file caching. 2) Why not use Lucene's indexing algorithm and store binary files with lucene index?! I love the indexing algorithm, in that, you don't need to manage the free space like that in a typical file system. Because the merge process will take care of reclaiming the free space automatically. Should these two advices be good? Jian On 6/29/05, bib_lucene bib wrote: > Hi All > > In my webapp i have people uploading their documents. My server is > windows/tomcat. I am thinking there will be a limit on the no of files in a > directory. Typically apllication users will load 3-5 page word docs. > > 1. How does one design the system such that there will not be any problem as > the users keep uploading their files, even if a million files are reached. > 2. Is there a sample application that does this. > 3. Should I have lucene update index after each upload or should I do it like > once a day. > > Thanks > Bib > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com