On Thu, 2010-10-21 at 05:01 +0200, Sahin Buyrukbilen wrote:
> Unfortunately both methods didnt go through. I am getting memory error even
> at reading the directory contents.
Then your problem is probably not Lucene related, but the sheer number
of files returned by listFiles.
A Java File contains the full path name for the file. Let's say that
this is 50 characters, which translates to about (50 * 2 + 45) ~ 150
bytes for the Java String. Add an int (4 bytes) plus bookkeeping and
we're up to about 200 bytes/File.
4.5 million Files thus takes up about 1 GB. Not enough to explain the
OOM, but if the full path name of your files is 150 characters, the list
takes up 2 GB.
> Now, I am thinking this: What if I split 4.5million files into 100.000 (or
> less depending on java error) files directories, index each of them
> separately and merge those indexes(if possible).
You don't need to create separate indexes and merge them. Just split
your 4.5 million files into folders of more manageable sizes and perform
a recursive descend. Something like
public static void addFolder(IndexWriter writer, File folder) {
File[] files = folder.listFiles();
for (File file: files) {
if (file.isDirectory()) {
addFolder(writer, file);
} else {
// Create Document from file and add it using the writer
}
}
}
- Toke
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]