[ https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289401#comment-13289401 ]
Simon Willnauer commented on LUCENE-3108: ----------------------------------------- {quote}Hi, Simon. Can doc values be optional? I am looking into org.apache.lucene.codecs.DocValuesConsumer#merge and see that the logic assumes that for every docId we have a existing value. Or we use the default value instead? {quote} hey, DocValues are dense and assume a value for each document. Yet, if you don't enable DocValues on a fields its not stored so you only store it for certain fields. If you have just a small set of repeated values DocValues can store them efficiently and dedupliate if you are concerned about that. in general you should rather ask these kind of questions on the main dev mailing list. simon > Land DocValues on trunk > ----------------------- > > Key: LUCENE-3108 > URL: https://issues.apache.org/jira/browse/LUCENE-3108 > Project: Lucene - Java > Issue Type: Task > Components: core/index, core/search, core/store > Affects Versions: CSF branch, 4.0 > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Fix For: 4.0 > > Attachments: LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108.patch, > LUCENE-3108.patch, LUCENE-3108.patch, LUCENE-3108_CHANGES.patch > > > Its time to move another feature from branch to trunk. I want to start this > process now while still a couple of issues remain on the branch. Currently I > am down to a single nocommit (javadocs on DocValues.java) and a couple of > testing TODOs (explicit multithreaded tests and unoptimized with deletions) > but I think those are not worth separate issues so we can resolve them as we > go. > The already created issues (LUCENE-3075 and LUCENE-3074) should not block > this process here IMO, we can fix them once we are on trunk. > Here is a quick feature overview of what has been implemented: > * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, > Bytes (fixed / variable size each in sorted, straight and deref variations) > * Integration into Flex-API, Codec provides a > PerDocConsumer->DocValuesConsumer (write) / PerDocValues->DocValues (read) > * By-Default enabled in all codecs except of PreFlex > * Follows other flex-API patterns like non-segment reader throw UOE forcing > MultiPerDocValues if on DirReader etc. > * Integration into IndexWriter, FieldInfos etc. > * Random-testing enabled via RandomIW - injecting random DocValues into > documents > * Basic checks in CheckIndex (which runs after each test) > * FieldComparator for int and float variants (Sorting, currently directly > integrated into SortField, this might go into a separate DocValuesSortField > eventually) > * Extended TestSort for DocValues > * RAM-Resident random access API plus on-disk DocValuesEnum (currently only > sequential access) -> Source.java / DocValuesEnum.java > * Extensible Cache implementation for RAM-Resident DocValues (by-default > loaded into RAM only once and freed once IR is closed) -> SourceCache.java > > PS: Currently the RAM resident API is named Source (Source.java) which seems > too generic. I think we should rename it into RamDocValues or something like > that, suggestion welcome! > Any comments, questions (rants :)) are very much appreciated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org