Hi John, I am interested in tuning my FastBit indexes to optimize query performance while keeping the file sizes from growing too much. The most common query that my users pose extracts features that overlap with a specific genomic interval. It uses these three columns:
dna - low cardinality (<100) start - high cardinality (min = ~1 max = ~250 million) end - high cardinality (min =~ 1 max = ~250 million) SELECT col1,col2,col3 FROM table WHERE dna = dna_id AND end > i_start AND start < i_end The size of the query interval (i_end - i_start) varies from 20 to 200,000. From my reading of the FastBit literature, it looks like the dna column should be equality encoded and the other 2 columns should be binned and range encoded, but I could be mistaken. 1. What are the default index specs for these columns? 2. Which other options should I try? 3. More generally, does the choice of index spec impact other functions (histograms)? Thanks, Andrew _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
