[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655525#comment-13655525 ]
Jack Krupansky commented on LUCENE-4583: ---------------------------------------- bq. abusing docvalues as stored fields Great point. I have to admit that I still don't have a 100% handle on the use case(s) for docvalues vs. stored fields, even though I've asked on the list. I mean, sometimes the chatter seems to suggest that dv is the successor to stored values. Hmmm... in that case, I should be able to store the full text of a 24 MB PDF file in a dv. Now, I know that isn't true. Maybe we just need to start with some common use cases, based on size: tiny (16 bytes or less), small (256 or 1024 bytes or less), medium (up to 32K), and large (upwards of 1MB, and larger.) It sounds like large implies stored field. A related "concern" is dv or stored fields that need a bias towards being in memory and in the heap, vs. a bias towards being "off heap". Maybe the size category is the hint: tiny and small bias towards on-heap, medium and certainly large bias towards off-heap. If people are only going towards DV because they think they get off-heap, then maybe we need to reconsider the model of what DV vs. stored is really all about. But then that leads back to DV somehow morphing out of column-stride fields. > StraightBytesDocValuesField fails if bytes > 32k > ------------------------------------------------ > > Key: LUCENE-4583 > URL: https://issues.apache.org/jira/browse/LUCENE-4583 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 4.0, 4.1, 5.0 > Reporter: David Smiley > Priority: Critical > Fix For: 4.4 > > Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch > > > I didn't observe any limitations on the size of a bytes based DocValues field > value in the docs. It appears that the limit is 32k, although I didn't get > any friendly error telling me that was the limit. 32k is kind of small IMO; > I suspect this limit is unintended and as such is a bug. The following > test fails: > {code:java} > public void testBigDocValue() throws IOException { > Directory dir = newDirectory(); > IndexWriter writer = new IndexWriter(dir, writerConfig(false)); > Document doc = new Document(); > BytesRef bytes = new BytesRef((4+4)*4097);//4096 works > bytes.length = bytes.bytes.length;//byte data doesn't matter > doc.add(new StraightBytesDocValuesField("dvField", bytes)); > writer.addDocument(doc); > writer.commit(); > writer.close(); > DirectoryReader reader = DirectoryReader.open(dir); > DocValues docValues = MultiDocValues.getDocValues(reader, "dvField"); > //FAILS IF BYTES IS BIG! > docValues.getSource().getBytes(0, bytes); > reader.close(); > dir.close(); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org