Hi,I am a software engineer and my name is Ying, I started to use fastbit1.2.2 examples/ibis.cpp to query index several months ago, my data set has ~2 billion rows and about 10 columns, I met some performance problem using select to output two columns, it took about 60 seconds to get the result back. If I just queried the count, the performance is quick. Are there any suggestion I could
improve it? Thank you very much for your help.
Regards Ying Query command:/nfs/panda/production/ena/index/fastbit-ibis1.2.2/examples/ibis -d /nfs/panda/production/ena/index/fastbitindex/release -query 'select a,b where b=9606 LIMIT 10000, 10'
a, b (with counts) "AA019136", 9606, 1 "AA019137", 9606, 1 "AA019138", 9606, 1 "AA019139", 9606, 1 "AA019140", 9606, 1 "AA019141", 9606, 1 "AA019142", 9606, 1 "AA019143", 9606, 1 "AA019144", 9606, 1 "AA019145", 9606, 1doQuery:: evaluate(SELECT a,b FROM release WHERE b=9606 LIMIT 10000, 10) produced 28878169 hits, took 64.8621 CPU seconds, 101.228 elapsed seconds
The index structure looks like:# meta data for data partition release written by ibis::tafel::write on Tue Feb 15 21:19:11 2011
BEGIN HEADER Name = releaseDescription = /homes/ycheng/fastbit-ibis1.2.1/examples/.libs/lt-ardea -d /nfs/panda/production/ena/index/fastbitindex/release -m a:text, b:int, c:text, d:long, e:category, f:int, g:text, h:text, i:category,
j:text -t /nfs/panda/production/ena/index/fastbit/embl.csv Number_of_rows = 207255767 Number_of_columns = 10 Timestamp = 1297804751 END HEADER Begin Column name = a data_type = TEXT index=none End Column Begin Column name = b data_type = INT End Column Begin Column name = c data_type = TEXT index=none End Column Begin Column name = d data_type = LONG End Column Begin Column name = e data_type = CATEGORY End Column Begin Column name = f data_type = INT End Column Begin Column name = g data_type = TEXT index=none End Column Begin Column name = h data_type = TEXT index=none End Column Begin Column name = i data_type = CATEGORY End Column Begin Column name = j data_type = TEXT index=none End Column And the index size is: total 17547332 -rwxrwxrwx 1 datalib services1 2161017944 Feb 15 21:19 a -rw-r--r-- 1 ycheng services1 1658046144 Feb 16 10:13 a.sp -rwxrwxrwx 1 datalib services1 829023068 Feb 15 21:19 b -rwxrwxrwx 1 ycheng services1 34104532 Feb 16 09:48 b.idx -rwxrwxrwx 1 datalib services1 694598383 Feb 15 21:19 c -rw-r--r-- 1 ycheng services1 1658046144 Feb 16 10:14 c.sp -rwxrwxrwx 1 datalib services1 1658046136 Feb 15 21:19 d -rwxrwxrwx 1 datalib services1 1576780344 Feb 15 21:20 e -rw-r--r-- 1 ycheng services1 80 Feb 16 10:15 e.dic -rw-r--r-- 1 ycheng services1 385424 Feb 16 10:15 e.idx -rwxrwxrwx 1 datalib services1 829023068 Feb 15 21:20 f -rw-r--r-- 1 ycheng services1 28403764 Feb 16 13:49 f.idx -rwxrwxrwx 1 datalib services1 30652 Feb 15 21:20 f.msk -rwxrwxrwx 1 datalib services1 355338336 Feb 15 21:20 g -rwxrwxrwx 1 datalib services1 593292 Feb 15 21:20 g.msk -rwxrwxrwx 1 datalib services1 292419708 Feb 15 21:20 h -rwxrwxrwx 1 datalib services1 593292 Feb 15 21:20 h.msk -rwxrwxrwx 1 datalib services1 1822783677 Feb 15 21:20 i -rwxrwxrwx 1 datalib services1 593292 Feb 15 21:20 i.msk -rwxrwxrwx 1 datalib services1 207439183 Feb 15 21:21 j -rwxrwxrwx 1 datalib services1 593292 Feb 15 21:20 j.msk -rwxrwxrwx 1 datalib services1 1047 Feb 15 21:21 -part.txt
<<attachment: ycheng.vcf>>
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
