Another suggestion from me:
How about making token object as an singleton?
> Maybe we should un-deprecate the termText() method but add javadocs
> explaining that for better performance you should use the char[] reuse
> methods instead?
>
> Mike
>
> DM Smith wrote:
>
> > Michael McCandless wrote:
> >>
> >> DM Smith wrote:
> >>
> >>> Shouldn't Term have constructors that take a Token?
> >>
> >> I think that makes sense, though normally Token appears during
> >> analysis and Term during searching (I think?) -- how often would
> >> you need to make a Term from a Token?
> >>
> > The problem I'm addressing is that tokens are used in contexts that
> > need String and not char[].
> > The call to the deprecated
> > String termText = token.termText();
> > needs to be replaced with:
> > String termText = new String(token.termBuffer(), 0,
> > token.termLength());
> >
> > There are over 170 calls to token.termText(), each of these places
> > have to be modified. In some, perhaps many, of these cases it may be
> > possible to use char[] directly to get a performance gain.
> >
> > In the case of Term changing it to work with char[] buffer, int
> > start, int length, does not seem quite right. I think the ripple
> > would keep getting bigger. But logically, the Term's text is the
> > text of a Token.
> >
> > To me it makes sense to have a method that returns the token as a
> > String, but that method is deprecated and the suggested replacement
> > is to directly use the buffer. So this leads to the above construct.
> > Perhaps it would be good to add a new method and document that as
> > one of two replacements.
> > public String term() {
> > return termText != null ? termText : new String(token.termBuffer(),
> > 0, token.termLength());
> > }
> >
> > Here is an example from QueryParser that has 5 instances, each
> > calling the deprecated t.termText() method. In this example, there
> > is the construction of a query from a token stream.
> > Each of the problem lines are of the pattern:
> > TermQuery currentQuery = new TermQuery(new Term(field,
> > t.termText()));
> >
> > To remove the deprecated call to t.termText(), the Token's buffer
> > needs to be marshalled with something like:
> > String termText = new String(token.termBuffer(), 0,
> > token.termLength());
> > TermQuery currentQuery = new TermQuery(new Term(field, termText)));
> >
> > /**
> > * @exception ParseException throw in overridden method to disallow
> > */
> > protected Query getFieldQuery(String field, String queryText)
> > throws ParseException {
> > // Use the analyzer to get all the tokens, and then build a
> > TermQuery,
> > // PhraseQuery, or nothing based on the term count
> >
> > TokenStream source = analyzer.tokenStream(field, new
> > StringReader(queryText));
> > Vector v = new Vector();
> > org.apache.lucene.analysis.Token t;
> > int positionCount = 0;
> > boolean severalTokensAtSamePosition = false;
> >
> > while (true) {
> > try {
> > t = source.next();
> > }
> > catch (IOException e) {
> > t = null;
> > }
> > if (t == null)
> > break;
> > v.addElement(t);
> > if (t.getPositionIncrement() != 0)
> > positionCount += t.getPositionIncrement();
> > else
> > severalTokensAtSamePosition = true;
> > }
> > try {
> > source.close();
> > }
> > catch (IOException e) {
> > // ignore
> > }
> >
> > if (v.size() == 0)
> > return null;
> > else if (v.size() == 1) {
> > t = (org.apache.lucene.analysis.Token) v.elementAt(0);
> > return new TermQuery(new Term(field, t.termText()));
> > } else {
> > if (severalTokensAtSamePosition) {
> > if (positionCount == 1) {
> > // no phrase query:
> > BooleanQuery q = new BooleanQuery(true);
> > for (int i = 0; i < v.size(); i++) {
> > t = (org.apache.lucene.analysis.Token) v.elementAt(i);
> > TermQuery currentQuery = new TermQuery(
> > new Term(field, t.termText()));
> > q.add(currentQuery, BooleanClause.Occur.SHOULD);
> > }
> > return q;
> > }
> > else {
> > // phrase query:
> > MultiPhraseQuery mpq = new MultiPhraseQuery();
> > mpq.setSlop(phraseSlop);
> > List multiTerms = new ArrayList();
> > int position = -1;
> > for (int i = 0; i < v.size(); i++) {
> > t = (org.apache.lucene.analysis.Token) v.elementAt(i);
> > if (t.getPositionIncrement() > 0 && multiTerms.size() > 0) {
> > if (enablePositionIncrements) {
> > mpq.add((Term[])multiTerms.toArray(new
> > Term[0]),position);
> > } else {
> > mpq.add((Term[])multiTerms.toArray(new Term[0]));
> > }
> > multiTerms.clear();
> > }
> > position += t.getPositionIncrement();
> > multiTerms.add(new Term(field, t.termText()));
> > }
> > if (enablePositionIncrements) {
> > mpq.add((Term[])multiTerms.toArray(new Term[0]),position);
> > } else {
> > mpq.add((Term[])multiTerms.toArray(new Term[0]));
> > }
> > return mpq;
> > }
> > }
> > else {
> > PhraseQuery pq = new PhraseQuery();
> > pq.setSlop(phraseSlop);
> > int position = -1;
> > for (int i = 0; i < v.size(); i++) {
> > t = (org.apache.lucene.analysis.Token) v.elementAt(i);
> > if (enablePositionIncrements) {
> > position += t.getPositionIncrement();
> > pq.add(new Term(field, t.termText()),position);
> > } else {
> > pq.add(new Term(field, t.termText()));
> > }
> > }
> > return pq;
> > }
> > }
> > }
> >
> >
> > Here is an example that works around the deprecated code:
> > public void testShingleAnalyzerWrapperPhraseQuery() throws Exception {
> > Analyzer analyzer = new ShingleAnalyzerWrapper(new
> > WhitespaceAnalyzer(), 2);
> > searcher = setUpSearcher(analyzer);
> >
> > PhraseQuery q = new PhraseQuery();
> >
> > TokenStream ts = analyzer.tokenStream("content",
> > new StringReader("this
> > sentence"));
> > Token token;
> > int j = -1;
> > while ((token = ts.next()) != null) {
> > j += token.getPositionIncrement();
> > String termText = new String(token.termBuffer(), 0,
> > token.termLength());
> > q.add(new Term("content", termText), j);
> > }
> >
> > Hits hits = searcher.search(q);
> > int[] ranks = new int[] { 0 };
> > compareRanks(hits, ranks);
> > }
> >
> > -- DM
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]