It is a great. Julien. Thanks. Next time I am going to post the requests to the developer groups.
Regards, Hui ----- Original Message ----- From: "Julien Nioche" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, September 23, 2003 5:38 AM Subject: Proposition :adding minMergeDoc to IndexWriter > Hui, > > Concerning an other point of your request list I proposed a patch this week > end on the lucene-dev list and i totally forgot that this feature was > requested on the user list. > > This new feature should help you to set a number of Documents to be merged > in memory independently of the mergeFactor. > > Any comments would be appreciated > > Best regards > > Julien Nioche > http://www.lingway.com > > ---------- Debut du message initial ----------- > > De : "fp235-5" <[EMAIL PROTECTED]> > A : "lucene-dev" <[EMAIL PROTECTED]> > Copies : > Date : Sat, 20 Sep 2003 16:06:06 +0200 > Sujet : [PATCH] IndexWriter : controling the number of Docs merged > > Hello, > > Someone made a suggestion yesterday about adding a variable to IndexWriter > in > order to control the number of Documents merged in RAMDirectory > independently of > the mergeFactor. (I'm sorry I don't remember who exactly and the mail > arrived at > my office). > I'm proposing a tiny modification of IndexWriter to add this functionality. > A > variable minMergeDocs specifies the number of Documents to be merged in > memory > before starting a new Segment. The mergeFactor still control the number of > Segments created in the Directory and thus it's possible to avoid the file > number limitation problem. > > The diff file is attached. > > As noticed by Dmitry and Erik there are no true JUnit tests. I'd be OK to > write > a JUnit test for this feature. The problem is that the SegmentInfos field is > private in IndexWriter and can't be used to check the number and size of the > Segments. I ran a test using the infoStream variable of IndexWriter - > everything > seems to be OK. > > Any comments / suggestions are welcome. > > Regards > > Julien > > > > > > > > > > ----- Original Message ----- > From: "hui" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Monday, September 22, 2003 3:40 PM > Subject: Re: per-field Analyzer (was Re: some requests) > > > > Good work, Erik. > > > > Hui > > > > ----- Original Message ----- > > From: "Erik Hatcher" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Saturday, September 20, 2003 4:13 AM > > Subject: per-field Analyzer (was Re: some requests) > > > > > > > On Friday, September 19, 2003, at 07:45 PM, Erik Hatcher wrote: > > > > On Friday, September 19, 2003, at 11:15 AM, hui wrote: > > > >> 1. Move the Analyzer down to field level from document level so some > > > >> fields > > > >> could be applied a specail analyzer.Other fields still use the > default > > > >> analyzer from the document level. > > > >> For example, I do not need to index the number for the "content" > > > >> field. It > > > >> helps me reduce the index size a lot when I have some excel files. > > > >> But I > > > >> always need the "created_date" to be indexed though it is a number > > > >> field. > > > >> > > > >> I know there are some workarounds put in the group, but I think it > > > >> should be > > > >> a good feature to have. > > > > > > > > The "workaround" is to write a custom analyzer and and have it do the > > > > desired thing per-field. > > > > > > > > Hmmm.... just thinking out loud here without knowing if this is > > > > possible, but could a generic "wrapper" Analyzer be written that > > > > allows other analyzers to be used under the covers based on a field > > > > name/analyzer mapping? If so, that would be quite cool and save > > > > folks from having to write custom analyzers as much to handle this > > > > pretty typical use-case. I'll look into this more in the very near > > > > future personally, but feel free to have a look at this yourself and > > > > see what you can come up with. > > > > > > What about something like this? > > > > > > public class PerFieldWrapperAnalyzer extends Analyzer { > > > private Analyzer defaultAnalyzer; > > > private Map analyzerMap = new HashMap(); > > > > > > > > > public PerFieldWrapperAnalyzer(Analyzer defaultAnalyzer) { > > > this.defaultAnalyzer = defaultAnalyzer; > > > } > > > > > > public void addAnalyzer(String fieldName, Analyzer analyzer) { > > > analyzerMap.put(fieldName, analyzer); > > > } > > > > > > public TokenStream tokenStream(String fieldName, Reader reader) { > > > Analyzer analyzer = (Analyzer) analyzerMap.get(fieldName); > > > if (analyzer == null) { > > > analyzer = defaultAnalyzer; > > > } > > > > > > return analyzer.tokenStream(fieldName, reader); > > > } > > > } > > > > > > This would allow you to construct a single analyzer out of others, on a > > > per-field basis, including a default one for any fields that do not > > > have a special one. Whether the constructor should take the map or the > > > addAnalyzer method is implemented is debatable, but I prefer the > > > addAnalyzer way. Maybe addAnalyzer could return 'this' so you could > > > chain: new PerFieldWrapperAnalyzer(new > > > StandardAnalyzer).addAnalyzer("field1", new > > > WhitespaceAnalyzer()).addAnalyzer(.....). And I'm more inclined to > > > call this thing PerFieldAnalyzerWrapper instead. Any naming > > > suggestions? > > > > > > This simple little class would seem to be the answer to a very common > > > question asked. > > > > > > Thoughts? Should this be made part of the core? > > > > > > Erik > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > ---------------------------------------------------------------------------- ---- > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
