Somehow I missed the very first message in this thread. James, did you upload a patch with your changes to JIRA?
Otis ----- Original Message ---- From: James Kennedy <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Friday, February 23, 2007 1:02:38 PM Subject: Re: [jira] Field constructor, avoiding String.intern() In our case, we're trying to optimize document() retrieval and we found that disabling the String interning in the Field constructor improved performance dramatically. I agree that interning should be an option on the constructor. For document retrieval, at least for a small of amount of fields, the performance gain of using equals() on interned strings is no match for the performance loss of interning the field name of each field. Wolfgang Hoschek-2 wrote: > > I noticed that, too, but in my case the difference was often much > more extreme: it was one of the primary bottlenecks on indexing. This > is the primary reason why MemoryIndex.addField(...) navigates around > the problem by taking a parameter of type "String fieldName" instead > of type "Field": > > public void addField(String fieldName, TokenStream stream) { > /* > * Note that this method signature avoids having a user call new > * o.a.l.d.Field(...) which would be much too expensive due to the > * String.intern() usage of that class. > */ > > Wolfgang. > > On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote: > >> After profiling in-memory indexing, I noticed that >> calls to String.intern() showed up surprisingly high; >> especially the one from Field() constructor. This is >> understandable due to overhead String.intern() has >> (being native and synchronized method; overhead >> incurred even if String is already interned), and the >> fact this essentially gets called once per >> document+field combination. >> >> Now, it would be quite easy to improve things a bit >> (in theory), such that most intern() calls could be >> avoid, transparent to the calling app; for example, >> for each IndexWriter() one could use a simple >> HashMap() for caching interned Strings. This approach >> is more than twice as fast as directly calling >> intern(). One could also use per-thread cache, or >> global one; all of which would probably be faster. >> However, Field constructor hard-codes call to >> intern(), so it would be necessary to add a new >> constructor that indicates that field name is known to >> be interned. >> And there would also need to be a way to invoke the >> new optional functionality. >> >> Has anyone tried this approach to see if speedup is >> worth the hassle (in my case it'd probably be >> something like 2 - 3%, assuming profiler's 5% for >> intern() is accurate)? >> >> -+ Tatu +- >> >> >> __________________________________________________ >> Do You Yahoo!? >> Tired of spam? Yahoo! Mail has the best spam protection around >> http://mail.yahoo.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Field-constructor%2C-avoiding-String.intern%28%29-tf1123597.html#a9123600 Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]