Refactoring Lucene to Variable-Width DocIds

Ed Kohlwey Fri, 05 Jul 2013 12:19:18 -0700

Many distributed systems (ie git, dynamo) use a 16-byte or greater
psudorandom (or random) identifier for documents.


It would be nice to refactor Lucene to return a variable-width document ID
so that indices could be implemented over databases such as HBase,
Accumulo, Cassandra, etc. using a large, non-sequential identifier instead
of the current system which requires ID's to be sequential and 4 bytes.

Has anyone thought about doing this? Is there interest in such a
refactoring or prototype?

Refactoring Lucene to Variable-Width DocIds

Reply via email to