Hi, Gaurav,

If you have two million unique identifier, a good option is to sort
your data according to this identifier.  The exception to this might
be that you have some other operation that requires the data records
to be in a different order.

As for choosing between binary encoding and equality encoding, since
you mentioned you will be mostly doing equality queries on this
column, then it would be best to use the equality encoding.  There is
also an exception.  If you really want to keep the index size small,
then the binary encoding produce the smaller index.  However, for 2
million rows, the index size should not be a serious issue, I presume.
 If you are seriously worried about disk space, then sort the rows
according to this ID column.  Either use FastBit sorting procedure or
tell FastBit the data is sorted according to this column, so that
FastBit would know that this column is sorted.

John


On 7/17/12 10:32 AM, Gaurav Agarwal wrote:
> Hi John,
> 
> I have a column which contains about 2 million unique integers (total
> 2M rows). What would would you recommend as the best option to index
> them for fastest equality query on this column (binary, equality or
> something else?). I need to use this column only in equality
> conditions ( this column is being treated as an identifier of the row
> and therefore I'll not be using this in range operations as well as
> any aggregate operations).
> 
> In general, could you pls help us decide between equality and binary
> indexing? If instead of integers, I had 2M unique text values, would
> binary indexing be the best option?
> 
> Regards,
> Gaurav
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to