[ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546002 ]
Grant Ingersoll commented on LUCENE-1058: ----------------------------------------- {quote} What if they wanted 3 fields instead of two? {quote} True. I'll have to think about a more generic approach. In some sense, I think 2 is often sufficient, but you are right it isn't totally generic in the spirit of Lucene. To some extent, I was thinking that this could help optimize Solr's copyField mechanism. In Solr's case, I think you often have copy fields that have marginal differences in the filters that are applied. It would be useful for Solr to be able to optimize these so that it doesn't have to go through the whole analysis chain again. {quote} Isn't this what your current code does? {quote} No, in my main use case (# of buffered tokens is << # of source tokens) the only tokens kept around is the (much) smaller subset of buffered tokens. In the pre-analysis approach you have to keep the source field tokens and the buffered tokens. Not to mention that you are increasing the work by having to iterate over the cached tokens in the list in Lucene. Thus, you have the cost of the analysis in your application plus the storage of both token lists (one large, one small, likely) then in Lucene you have the cost of iterating over two lists. In my approach, I think, you have the cost of analysis plus the cost of storage of one list of tokens (small) and the cost of iterating that list. > New Analyzer for buffering tokens > --------------------------------- > > Key: LUCENE-1058 > URL: https://issues.apache.org/jira/browse/LUCENE-1058 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Fix For: 2.3 > > Attachments: LUCENE-1058.patch, LUCENE-1058.patch, LUCENE-1058.patch, > LUCENE-1058.patch, LUCENE-1058.patch > > > In some cases, it would be handy to have Analyzer/Tokenizer/TokenFilters that > could siphon off certain tokens and store them in a buffer to be used later > in the processing pipeline. > For example, if you want to have two fields, one lowercased and one not, but > all the other analysis is the same, then you could save off the tokens to be > output for a different field. > Patch to follow, but I am still not sure about a couple of things, mostly how > it plays with the new reuse API. > See > http://www.gossamer-threads.com/lists/lucene/java-dev/54397?search_string=BufferingAnalyzer;#54397 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]