As my tests show about 1/4 documents are relevant for scoring per query. So for my example with 100000 stacktraces in the index i need to score 25000 documents. I have a native implementation of the scoring algorithm which scores all 100000. That needs about 20ms. The lucene implementation needs for the same query >100ms what really sucks. Without retrieving fields it needs about 6ms - thats also what my target should be.
I tried without LAZY_LOAD, but there is no real difference. How can i sort by docIds first? FieldCache.DEFAULT.getStrings ist not a possibility cause of to the memory problem. This is how i store frames: for(StacktraceFrame frame : stacktrace.getFrames()) { doc.add(new Field(FIELD_FRAMES, frame.getClassName()+"."+frame.getMethod(), Store.YES, Index.NOT_ANALYZED)); } 2010/9/9 Michael McCandless <luc...@mikemccandless.com> > What a neat search engine! (Searching stack traces). > > Unfortunately, loading stored fields is slowish -- it entails 2 disk > seeks under the hood. Really you should retrieve at most a page worth > of docs, in the serial path of a query. How many are you retrieving > per query? > > That said, you shouldn't use LAZY_LOAD if you know you will need the > value. Also, it's possible that sorting the docIDs (ascending) first > may get you better performance since your load is then a single scan > of the 2 files in the index. > > You may want to use FieldCache.DEFAULT.getStrings instead -- this > gives you a very fast String[], but, may suck up tons of memory > depending on how many unique frames there are (how do you index each > frame?). > > Mike > > On Thu, Sep 9, 2010 at 4:01 AM, Johannes Lerch > <lerch.johan...@googlemail.com> wrote: > > Hi, > > > > i am working on a search for stacktraces. To do this i implemented my own > > Query, Weight and Scorer. I save exception, method and the frames as > fields > > in the index and am able to pick relevant documents by matching those > fields > > with my query stacktrace (using IndexReader.termDocs()). I implemented my > > own scoring which is calculated pairwise for stacktraces (the one of the > > query and each of the relevant documents). For this scoring i calculate a > > similarity between both traces by comparing the frames if they exist in > both > > and also check for ordering. This works similar as diff on text/source > code. > > My problem is, that i need all frames contained in both stacktraces, so i > > have to retrieve all frame fields of the stored stacktraces. For now i do > > this with: > > Document document = reader.document(doc, new FieldSelector() { > > @Override > > public FieldSelectorResult accept(String fieldName) { > > if(Indexer.FIELD_FRAMES.equals(fieldName)) > > return FieldSelectorResult.LAZY_LOAD; > > else > > return FieldSelectorResult.NO_LOAD; > > } > > }); > > Fieldable[] fieldables = document.getFieldables(Indexer.FIELD_FRAMES); > > > > But this call really decreases performance to something which is not > > agreeable for me (>10 times slower on 100000 stacktraces in index). So my > > question is, are there are other ways to get stored fields or do you have > > ideas for workarounds. Would it be better to store all stacktraces in a > > database and retrieve them from there? If so how do i get the docId of > > stacktraces i wrote to the index? > > > > Regards, > > Johannes > > >