Hi, John I think you cost too much time in I/O,and if you use RAMDirectory first will better.see http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
kai -----邮件原件----- 发件人: Erick Erickson [mailto:[EMAIL PROTECTED] 发送时间: 2007年8月13日 星期一 1:57 收件人: java-user@lucene.apache.org 主题: Re: Indexing correctly? Where are your source files and index? If they're somewhere out there on the network, you may be having some slowdown because of network latency (the part about "/mount/....." leads me to ask this one). If this is the case, you might get an improvement if all the files are local... Best Erick On 8/11/07, John Paul Sondag <[EMAIL PROTECTED]> wrote: > > It takes roughly 6 hours for me to index a Gig of data. The benchmarks > take > quite a bit less if I'm reading it correctly. I'll try out the > StringBuffer/Builder and let you know. Thanks for the quick response and > if > you have any more suggestions please let me know. > > --JP > > On 8/11/07, karl wettin <[EMAIL PROTECTED]> wrote: > > > > How much slower than anticipated is it? > > > > I would start by using a StringBuffer/Builder rather than appending > > (immutable) strings to each other. > > > > > > 11 aug 2007 kl. 19.05 skrev John Paul Sondag: > > > > > Hi, > > > > > > I was hoping that maybe you guys could see if I'm somehow indexing > > > inefficiently. I'm putting relevant parts of my code below. I've > > > looked at > > > the "benchmarks" page on Lucene and my indexing time is taking a > > > substantial > > > amount of time more than what I see posted. I'm not sure when I > > > should call > > > flush() ( I saw that I should be doing that on the > > > ImproveIndexingSpeed > > > page). I'd really appreciate any advice. > > > > > > Here's my code: > > > > > > File directory = new File( "/mounts/falcon5/disks/0/tcheng3/ > > > Dataset"); > > > File[] theFiles = directory.listFiles(); > > > > > > //go through each file inside the directory and index it > > > for(int curFile = 0; curFile < theFiles.length; curFile++) > > > { > > > File fin=theFiles[curFile]; > > > > > > //open up the file > > > FileInputStream inf = new FileInputStream(fin); > > > InputStreamReader isr = new InputStreamReader(inf, > > > "US-ASCII"); > > > BufferedReader in = new BufferedReader(isr); > > > String text=""; > > > String docid=""; > > > > > > while (true) { > > > > > > //read in the file one line at a time, and act accordingly > > > String line = in.readLine(); > > > if (line == null) { break;} > > > > > > if (line.startsWith("<DOC>") ) { > > > //get docID > > > line = in.readLine(); > > > String tempStr = line.substring(8,line.length()); > > > int pos = tempStr.indexOf(' '); > > > docid = tempStr.substring(0,pos); > > > }else if (line.startsWith("</DOC>")) { > > > > > > Document doc = new Document(); > > > > > > doc.add(new Field("contents",text, > > > Field.Store.NO, > > > Field.Index.TOKENIZED, Field.TermVector.WITH_POSITIONS )); > > > doc.add(new Field("DocID",docid, Field.Store.YES, > > > Field.Index.NO)); > > > writer.addDocument(doc); > > > text=""; > > > } else { > > > text = text + "\n" + line; > > > } > > > } > > > > > > } > > > > > > > > > int numIndexed = writer.docCount(); > > > > > > writer.optimize(); > > > writer.close(); > > > > > > > > > Thanks, > > > > > > --JP > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]