[
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-2380:
---------------------------------------
Attachment: LUCENE-2380.patch
OK I fixed up the patch. I think it's ready to commit, though it'd be
great if someone could double check my Solr changes...:
* Updated to trunk
* Fixed bug in Solr's ByteUtils.java (it was not respecting the
offset in the incoming BytesRef)
* Added optional boolean "fasterButMoreRAM" option when loading
field cache, defaults to true
* For DocTermsIndex, I defined ord=0 to mean "unset"; and made it
the caller's responsibility to do something with the ord=0 case if
empty (length=0) BytesRef isn't acceptable. Likewise, for
DocTerms, I now directly return empty BytesRef if doc didn't have
this field, but I also added an exists method to explicitly check
if you need to.
* Added a getTerm convenience method (calls getOrd then lookup, by
default) to the terms index; renamed DocTerms.get -> getTerm for
consistency
* Fixed the nocommits and/or changed to TODOs
* Small cleanups
I've also added a MIGRATE.txt that spells out more details on how an
app can cutover to the new APIs.
I think there are some other good things to do here, but as a future
issue (this one's big enough!) -- I'll open it:
* For DocTermsIndex, make it optional whether the bytes data is
loaded. EG for a single segment index (LUCENE-2335), or for sort
comparators apps that do not need the bytes data (eg because they
use terms dict to resolve ord -> term, and v/v).
* Possibly merge DocTerms & DocTermsIndex. EG it's dangerous today
if you load terms and then termsIndex because you're wasting tons
of RAM; it'd be nicer if we could have a single cache entry that'd
"upgrade" itself to be an index (have the ords).
> Add FieldCache.getTermBytes, to load term data as byte[]
> --------------------------------------------------------
>
> Key: LUCENE-2380
> URL: https://issues.apache.org/jira/browse/LUCENE-2380
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch,
> LUCENE-2380.patch
>
>
> With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode
> string, but not necessarily), so we need to push this up the search stack.
> FieldCache now has getStrings and getStringIndex; we need corresponding
> methods to load terms as native byte[], since in general they may not be
> representable as String. This should be quite a bit more RAM efficient too,
> for US ascii content since each character would then use 1 byte not 2.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]