I'm looking for a sample data set to benchmark the Lucene FST, specifically the keys. I'm guessing a common key type for HBase users is timestamp? Perhaps simply creating timestamps for 10's of millions of keys would be a reasonable benchmark? Though synthetic it's also easy to adjust (eg, increase or decrease the number of).
- Sample data set of HBase Jason Rutherglen