Yes I know that Lucene should not have any document size limits. All I
get is a lock file inside my index folder. Along with this there's no
other file inside the index folder. Then I get OOM exception.
Please provide some guidance...
Here is the example:
package com.issue;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.LongField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexCommit;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.LiveIndexWriterConfig;
import org.apache.lucene.index.LogByteSizeMergePolicy;
import org.apache.lucene.index.MergePolicy;
import org.apache.lucene.index.SerialMergeScheduler;
import org.apache.lucene.index.MergePolicy.OneMerge;
import org.apache.lucene.index.MergeScheduler;
import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.LineNumberReader;
import java.util.Date;
public class D {
/** Index all text files under a directory. */
static String[] filenames;
public static void main(String[] args) {
//String indexPath = args[0];
String indexPath="D:\\Issue";//Place where indexes will be created
String docsPath="Issue"; //Place where the files are kept.
boolean create=true;
String ch="OverAll";
final File docDir = new File(docsPath);
if (!docDir.exists() || !docDir.canRead()) {
System.out.println("Document directory '"
+docDir.getAbsolutePath()+ "' does not exist or is not readable, please
check the path");
System.exit(1);
}
Date start = new Date();
try {
Directory dir = FSDirectory.open(new File(indexPath));
Analyzer analyzer=new
com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,
analyzer);
iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
IndexWriter writer = new IndexWriter(dir, iwc);
if(ch.equalsIgnoreCase("OverAll")){
indexDocs(writer, docDir,true);
}else{
filenames=args[2].split(",");
// indexDocs(writer, docDir);
}
writer.commit();
writer.close();
} catch (IOException e) {
System.out.println(" caught a " + e.getClass() +
"\n with message: " + e.getMessage());
}
catch(Exception e)
{
e.printStackTrace();
}
}
//Over All
static void indexDocs(IndexWriter writer, File file,boolean flag)
throws IOException {
FileInputStream fis = null;
if (file.canRead()) {
if (file.isDirectory()) {
String[] files = file.list();
// an IO error could occur
if (files != null) {
for (int i = 0; i < files.length; i++) {
indexDocs(writer, new File(file, files[i]),flag);
}
}
} else {
try {
fis = new FileInputStream(file);
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
}
try {
Document doc = new Document();
Field pathField = new StringField("path", file.getPath(),
Field.Store.YES);
doc.add(pathField);
doc.add(new LongField("modified", file.lastModified(),
Field.Store.NO));
doc.add(new StringField("name",file.getName(),Field.Store.YES));
doc.add(new TextField("contents", new BufferedReader(new
InputStreamReader(fis, "UTF-8"))));
LineNumberReader lnr=new LineNumberReader(new FileReader(file));
String line=null;
while( null != (line = lnr.readLine()) ){
doc.add(new StringField("SC",line.trim(),Field.Store.YES));
// doc.add(new
Field("contents",line,Field.Store.YES,Field.Index.ANALYZED));
}
if (writer.getConfig().getOpenMode() ==
OpenMode.CREATE_OR_APPEND) {
writer.addDocument(doc);
writer.commit();
fis.close();
} else {
try
{
writer.updateDocument(new Term("path", file.getPath()), doc);
fis.close();
}catch(Exception e)
{
writer.close();
fis.close();
e.printStackTrace();
}
}
}catch (Exception e) {
writer.close();
fis.close();
e.printStackTrace();
}finally {
// writer.close();
fis.close();
}
}
}
}
}
On 8/29/2013 4:20 PM, Michael McCandless wrote:
Lucene doesn't have document size limits.
There are default limits for how many tokens the highlighters will process ...
But, if you are passing each line as a separate document to Lucene,
then Lucene only sees a bunch of tiny documents, right?
Can you boil this down to a small test showing the problem?
Mike McCandless
http://blog.mikemccandless.com
On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka
<ankit.mura...@rancoretech.com> wrote:
Hello all,
Faced with a typical issue.
I have many files which I am indexing.
Problem Faced:
a. File having size less than 20 MB are successfully indexed and merged.
b. File having size>20MB are not getting INDEXED.. No Exception is being
thrown. Only a lock file is being created in the index directory. The
indexing process for a single file exceeding 20 MB size continues for more
than 8 minutes after which I have a code which merge the generated index to
existing index.
Since no index is being generated now, I get an exception during merging
process.
Why Files having size greater than 20 MB are not being indexed..??. I am
indexing each line of the file. Why IndexWriter is not throwing any error.
Do I need to change any parameter in Lucene or tweak the Lucene settings ??
Lucene version is 4.4.0
My current deployment for Lucene is on a server running with 128 MB and 512
MB heap.
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with
what lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Regards
Ankit Murarka
"What lies behind us and what lies before us are tiny matters compared with what
lies within us"
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org