Many distributed systems (ie git, dynamo) use a 16-byte or greater psudorandom (or random) identifier for documents.
It would be nice to refactor Lucene to return a variable-width document ID so that indices could be implemented over databases such as HBase, Accumulo, Cassandra, etc. using a large, non-sequential identifier instead of the current system which requires ID's to be sequential and 4 bytes. Has anyone thought about doing this? Is there interest in such a refactoring or prototype?
