[ https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-510: -------------------------------------- Attachment: LUCENE-510.patch Attached patch. I modernized Marvin's original patch and added full backwards compatibility to it so that old indices can be opened for reading or writing. New segments are written in the new format. All tests pass. I think it's close, but, I need to run performance tests now to measure the impact to indexing throughput. I think future optimizations can keep the byte[] further, eg, into Term and FieldCache, as Yonik mentioned. We could also fix DocumentsWriter to use byte[] for its terms storage which would improve RAM efficiency for single-byte (ascii) content. I also updated the TestBackwardsCompatibility testcase to properly test non-ascii terms. > IndexOutput.writeString() should write length in bytes > ------------------------------------------------------ > > Key: LUCENE-510 > URL: https://issues.apache.org/jira/browse/LUCENE-510 > Project: Lucene - Java > Issue Type: Improvement > Components: Store > Affects Versions: 2.1 > Reporter: Doug Cutting > Assignee: Michael McCandless > Attachments: LUCENE-510.patch, SortExternal.java, strings.diff, > TestSortExternal.java > > > We should change the format of strings written to indexes so that the length > of the string is in bytes, not Java characters. This issue has been > discussed at: > http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html > We must increment the file format number to indicate this change. At least > the format number in the segments file should change. > I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until > after 2.0 is released, to minimize incompatible changes between 1.9 and 2.0 > (other than removal of deprecated features). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]