Also now on changing some settings for IndexWriterConfig and LiveIndexWriterConfig I get the following exception:
20:31:23,540 INFO java.lang.OutOfMemoryError: Java heap space20:31:23,540 INFO at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8WithHash(UnicodeUtil.java:136) 20:31:23,540 INFO at org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.fillBytesRef(CharTermAttributeImpl.java:91) 20:31:23,541 INFO at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:185) 20:31:23,541 INFO at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:165) 20:31:23,541 INFO at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245) 20:31:23,542 INFO at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265) 20:31:23,542 INFO at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) 20:31:23,542 INFO at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) 20:31:23,542 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188) 20:31:23,543 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169) 20:31:23,543 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:220) 20:31:23,543 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:167)
20:31:23,543 INFO at com.rancore.MainClass1.main(MainClass1.java:110)20:31:23,546 INFO java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit 20:31:23,546 INFO at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) 20:31:23,546 INFO at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) 20:31:23,546 INFO at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)
20:31:23,547 INFO at com.rancore.MainClass1.main(MainClass1.java:136) Can anyone please guide.... There has to be some way how a file of say 20 MB can be properly indexed... Any guidance is highly appreciated.. On 8/30/2013 6:49 PM, Ankit Murarka wrote:
Hello,The following exception is being printed on the server console when trying to index. As usual, indexes are not getting created.java.lang.OutOfMemoryError: Java heap spaceat org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:148) at org.apache.lucene.util.AttributeSource.<init>(AttributeSource.java:128) 18:42:21,764 INFO at org.apache.lucene.analysis.TokenStream.<init>(TokenStream.java:91) 18:42:21,765 INFO at org.apache.lucene.document.Field$StringTokenStream.<init>(Field.java:568) 18:42:21,765 INFO at org.apache.lucene.document.Field.tokenStream(Field.java:541) 18:42:21,765 INFO at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95) 18:42:21,766 INFO at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245) 18:42:21,766 INFO at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265) 18:42:21,766 INFO at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1188) 18:42:21,767 INFO at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1169) 18:42:21,768 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:197) 18:42:21,768 INFO at com.rancore.MainClass1.indexDocs(MainClass1.java:153)18:42:21,768 INFO at com.rancore.MainClass1.main(MainClass1.java:95)18:42:21,771 INFO java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit 18:42:21,772 INFO at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) 18:42:21,911 INFO at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) 18:42:21,911 INFO at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872)18:42:21,912 INFO at com.rancore.MainClass1.main(MainClass1.java:122) 18:42:22,008 INFO Indexing to directoryAny guidance will be highly appreciated...>!!!!... Server Opts are -server -Xms8192m -Xmx16384m -XX:MaxPermSize=512mOn 8/30/2013 3:13 PM, Ankit Murarka wrote:Hello.The server has much more memory. I have given minimum 8 GB to Application Server..The Java opts which are of interest is : -server -Xms8192m -Xmx16384m -XX:MaxPermSize=8192mEven after giving this much memory to the server, how come i am hitting OOM exceptions. No other activity is being performed on the server apart from this.Checking from JConsole, the maximum Heap during indexing was close to 1.2 GB whereas the memory allocated is as mentioned above,.I did mentioned 128MB also but this is when I start the server on a normal windows machine.Isn't there any property/configuration in LUCENE which I should do in order to index large files. Say about 30 MB.. I read something MergeFactor and etc. but was not able to set any value for it. Don't even know whether doing that will help the cause..On 8/29/2013 7:04 PM, Ian Lea wrote:Well, I use neither Eclipse nor your application server and can offer no advice on any differences in behaviour between the two. Maybe you should try Eclipse or app server forums. If you are going to index the complete contents of a file as one field you are likely to hit OOM exceptions. How big is the largest file you are ever going to index? The server may have 8GB but how much memory are you allowing the JVM? What are the command line flags? I think you mentioned 128Mb in an earlier email. That isn't much. -- Ian. On Thu, Aug 29, 2013 at 2:14 PM, Ankit Murarka <ankit.mura...@rancoretech.com> wrote:Hello, I get exception only when the code is fired from Eclipse.When it is deployed on an application server, I get no exception at all. This forced me to invoke the same code from Eclipse and check what is theissue.,. I ran the code on server with 8 GB memory.. Even then no exception occurred....!!.. Only write.lock is formed..Removing contents field is not desirable as this is needed for search towork perfectly... On 8/29/2013 6:17 PM, Ian Lea wrote:So you do get an exception after all, OOM. Try it without this line: doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-8")))); I think that will slurp the whole file in one go which will obviously need more memory on larger files than on smaller ones. Or just run the program with more memory, -- Ian. On Thu, Aug 29, 2013 at 1:05 PM, Ankit Murarka <ankit.mura...@rancoretech.com> wrote:Yes I know that Lucene should not have any document size limits. All Igetis a lock file inside my index folder. Along with this there's no otherfile inside the index folder. Then I get OOM exception. Please provide some guidance... Here is the example: package com.issue; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.LongField; import org.apache.lucene.document.StringField; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexCommit; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.LiveIndexWriterConfig; import org.apache.lucene.index.LogByteSizeMergePolicy; import org.apache.lucene.index.MergePolicy; import org.apache.lucene.index.SerialMergeScheduler; import org.apache.lucene.index.MergePolicy.OneMerge; import org.apache.lucene.index.MergeScheduler; import org.apache.lucene.index.Term; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; import java.io.InputStreamReader; import java.io.LineNumberReader; import java.util.Date; public class D { /** Index all text files under a directory. */ static String[] filenames; public static void main(String[] args) { //String indexPath = args[0];String indexPath="D:\\Issue";//Place where indexes will be createdString docsPath="Issue"; //Place where the files are kept. boolean create=true; String ch="OverAll"; final File docDir = new File(docsPath); if (!docDir.exists() || !docDir.canRead()) { System.out.println("Document directory '" +docDir.getAbsolutePath()+ "' does not exist or is not readable, please check the path"); System.exit(1); } Date start = new Date(); try { Directory dir = FSDirectory.open(new File(indexPath)); Analyzer analyzer=new com.rancore.demo.CustomAnalyzerForCaseSensitive(Version.LUCENE_44);IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_44,analyzer); iwc.setOpenMode(OpenMode.CREATE_OR_APPEND); IndexWriter writer = new IndexWriter(dir, iwc); if(ch.equalsIgnoreCase("OverAll")){ indexDocs(writer, docDir,true); }else{ filenames=args[2].split(","); // indexDocs(writer, docDir); } writer.commit(); writer.close(); } catch (IOException e) { System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage()); } catch(Exception e) { e.printStackTrace(); } } //Over Allstatic void indexDocs(IndexWriter writer, File file,boolean flag)throws IOException { FileInputStream fis = null; if (file.canRead()) { if (file.isDirectory()) { String[] files = file.list(); // an IO error could occur if (files != null) { for (int i = 0; i< files.length; i++) { indexDocs(writer, new File(file, files[i]),flag); } } } else { try { fis = new FileInputStream(file); } catch (FileNotFoundException fnfe) { fnfe.printStackTrace(); } try { Document doc = new Document();Field pathField = new StringField("path", file.getPath(),Field.Store.YES); doc.add(pathField); doc.add(new LongField("modified", file.lastModified(), Field.Store.NO)); doc.add(new StringField("name",file.getName(),Field.Store.YES)); doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-8")))); LineNumberReader lnr=new LineNumberReader(new FileReader(file)); String line=null; while( null != (line = lnr.readLine()) ){ doc.add(new StringField("SC",line.trim(),Field.Store.YES)); // doc.add(new Field("contents",line,Field.Store.YES,Field.Index.ANALYZED)); } if (writer.getConfig().getOpenMode() == OpenMode.CREATE_OR_APPEND) { writer.addDocument(doc); writer.commit(); fis.close(); } else { try {writer.updateDocument(new Term("path", file.getPath()),doc); fis.close(); }catch(Exception e) { writer.close(); fis.close(); e.printStackTrace(); } } }catch (Exception e) { writer.close(); fis.close(); e.printStackTrace(); }finally { // writer.close(); fis.close(); } } } } } On 8/29/2013 4:20 PM, Michael McCandless wrote:Lucene doesn't have document size limits. There are default limits for how many tokens the highlighters will process ... But, if you are passing each line as a separate document to Lucene, then Lucene only sees a bunch of tiny documents, right? Can you boil this down to a small test showing the problem? Mike McCandless http://blog.mikemccandless.com On Thu, Aug 29, 2013 at 1:51 AM, Ankit Murarka <ankit.mura...@rancoretech.com> wrote:---------------------------------------------------------------------Hello all, Faced with a typical issue. I have many files which I am indexing. Problem Faced: a. File having size less than 20 MB are successfully indexed and merged. b. File having size>20MB are not getting INDEXED.. No Exception is beingthrown. Only a lock file is being created in the index directory. The indexing process for a single file exceeding 20 MB size continues formore than 8 minutes after which I have a code which merge the generated index to existing index. Since no index is being generated now, I get an exception during merging process.Why Files having size greater than 20 MB are not being indexed..??. Iamindexing each line of the file. Why IndexWriter is not throwing anyerror. Do I need to change any parameter in Lucene or tweak the Lucene settings ?? Lucene version is 4.4.0My current deployment for Lucene is on a server running with 128 MB and512 MB heap. -- Regards Ankit Murarka"What lies behind us and what lies before us are tiny matters comparedwith what lies within us"---------------------------------------------------------------------To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.orgTo unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org-- Regards Ankit Murarka"What lies behind us and what lies before us are tiny matters comparedwith what lies within us"---------------------------------------------------------------------To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org-- Regards Ankit Murarka"What lies behind us and what lies before us are tiny matters compared withwhat lies within us" --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
-- Regards Ankit Murarka "What lies behind us and what lies before us are tiny matters compared with what lies within us" --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org