[ 
https://issues.apache.org/jira/browse/LUCENE-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2944:
--------------------------------

    Attachment: LUCENE-2944.patch

I reviewed all uses of this attribute, and fixed some more problems in contrib 
and solr.

So in my opinion there are two options:
1. apply this patch and fix the javadoc for this expert attribute, which does 
say that it makes a copy of the bytes.
2. Don't apply this patch, but instead change Test2BTerms and 
ICUCollationAttribute to make (useless) copies of the bytes for each term.

The indexer has no problems either way, the problem is only other consumers. 
I'm just bringing up the second option because any performance improvement 
saved from not copying the bytes might be negligible, and clearly its easy to 
screw this up.


> BytesRef reuse bugs in QueryParser and analysis.jsp
> ---------------------------------------------------
>
>                 Key: LUCENE-2944
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2944
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2944.patch, LUCENE-2944.patch
>
>
> Some code uses BytesRef as if it were a "String", in this case consumers of 
> TermToBytesRefAttribute.
> The thing is, while our general implementation works on char[] and then 
> populates the consumers BytesRef,
> not all TermToBytesRefAttribute implementations do this, specifically ICU 
> collation, it reuses the bytes and simply sets the pointers:
> {noformat}
>   @Override
>   public int toBytesRef(BytesRef target) {
>     collator.getRawCollationKey(toString(), key);
>     target.bytes = key.bytes;
>     target.offset = 0;
>     target.length = key.size;
>     return target.hashCode();
>   }
> {noformat}
> Most of the blame falls on me as I added this to the queryparser in 
> LUCENE-2514.
> Attached is a patch so that these consumers re-use a 'spare' and copy the 
> bytes when they are going to make a long lasting object such as a Term.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to