Hi John,
I am interested in tuning my FastBit indexes to optimize query  
performance while keeping the file sizes from growing too much.  The  
most common query that my users pose extracts features that overlap  
with a specific genomic interval.  It uses these three columns:

dna - low cardinality (<100)
start - high cardinality (min = ~1 max = ~250 million)
end - high cardinality (min =~ 1 max = ~250 million)

SELECT col1,col2,col3 FROM table WHERE dna = dna_id AND end > i_start  
AND start < i_end

The size of the query interval (i_end - i_start) varies from 20 to  
200,000.
 From my reading of the FastBit literature, it looks like the dna  
column should be equality encoded and the other 2 columns should be  
binned and range encoded, but I could be mistaken.

1. What are the default index specs for these columns?
2. Which other options should I try?
3. More generally, does the choice of index spec impact other  
functions (histograms)?

Thanks,
Andrew
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to