Hi experts, I'm upgrading Lucene 4.4 and trying to use DocValues instead of store field for performance reason. But due to unknown size of index(depends on customer), so I will use DiskDocValuesFormat, especially for some binary field. Then I wrote my customized Codec:
final Codec codec = new Lucene42Codec() { private final Lucene42DocValuesFormat memoryDVFormat = new Lucene42DocValuesFormat(); private final DiskDocValuesFormat diskDVFormat = new DiskDocValuesFormat(); @Override public DocValuesFormat getDocValuesFormatForField(String field) { if (LucenePluginConstants.INDEX_STORED_RETURNABLE_FIELD.equals(field) || LucenePluginConstants.PAYLOAD_FIELD_NAME.equals(field) || LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE.equals(field)) { return diskDVFormat; } else { return memoryDVFormat } } }; iwc.setCodec(codec); Here field LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE is numeric field, long type. And others are binary. Then I consume DV like below pseudo-code: nodeIDDocValuesSource = MultiDocValues.getNumericValues(searcher.getIndexReader(), LucenePluginConstants.INDEX_NODE_ID_DOC_VALUE); ...... long nodeId= nodeIDDocValuesSource.get(scoreDoc.doc); Then I'm sure I get a wrong nodeId, which will be verified by upper logic and treated as data corruption. But if I change to memoryDVFormat for the long type field, then everything is OK. Also for upgrading legacy data, I keep two index format, DV or stored field, controlled by version. If I use stored field, everything is OK. So I guess there is a bug with DiskDocValuesFormat, numeric data type, does it relate to byte-aligned numeric compression? Or I didn't use DiskDocValuesFormat correctly? Seems no other parameters for it. Sorry that I have no pure Lucene test case yet. Hope someone shed some light on this. Best regards, Duke If not now, when? If not me, who?