Hi,

I am a software engineer and my name is Ying, I started to use fastbit1.2.2 examples/ibis.cpp to query index several months ago, my data set has ~2 billion rows and about 10 columns, I met some performance problem using select to output two columns, it took about 60 seconds to get the result back. If I just queried the count, the performance is quick. Are there any suggestion I could
improve it? Thank you very much for your help.

Regards
Ying

Query command:
/nfs/panda/production/ena/index/fastbit-ibis1.2.2/examples/ibis -d /nfs/panda/production/ena/index/fastbitindex/release -query 'select a,b where b=9606 LIMIT 10000, 10'
a, b (with counts)
"AA019136", 9606,       1
"AA019137", 9606,       1
"AA019138", 9606,       1
"AA019139", 9606,       1
"AA019140", 9606,       1
"AA019141", 9606,       1
"AA019142", 9606,       1
"AA019143", 9606,       1
"AA019144", 9606,       1
"AA019145", 9606,       1
doQuery:: evaluate(SELECT a,b FROM release WHERE b=9606 LIMIT 10000, 10) produced 28878169 hits, took 64.8621 CPU seconds, 101.228 elapsed seconds

The index structure looks like:

# meta data for data partition release written by ibis::tafel::write on Tue Feb 15 21:19:11 2011

BEGIN HEADER
Name = release
Description = /homes/ycheng/fastbit-ibis1.2.1/examples/.libs/lt-ardea -d /nfs/panda/production/ena/index/fastbitindex/release -m a:text, b:int, c:text, d:long, e:category, f:int, g:text, h:text, i:category,
j:text -t /nfs/panda/production/ena/index/fastbit/embl.csv
Number_of_rows = 207255767
Number_of_columns = 10
Timestamp = 1297804751
END HEADER

Begin Column
name = a
data_type = TEXT
index=none
End Column

Begin Column
name = b
data_type = INT
End Column

Begin Column
name = c
data_type = TEXT
index=none
End Column

Begin Column
name = d
data_type = LONG
End Column

Begin Column
name = e
data_type = CATEGORY
End Column

Begin Column
name = f
data_type = INT
End Column

Begin Column
name = g
data_type = TEXT
index=none
End Column

Begin Column
name = h
data_type = TEXT
index=none
End Column

Begin Column
name = i
data_type = CATEGORY
End Column

Begin Column
name = j
data_type = TEXT
index=none
End Column

And the index size is:

total 17547332
-rwxrwxrwx 1 datalib services1 2161017944 Feb 15 21:19 a
-rw-r--r-- 1 ycheng  services1 1658046144 Feb 16 10:13 a.sp
-rwxrwxrwx 1 datalib services1  829023068 Feb 15 21:19 b
-rwxrwxrwx 1 ycheng  services1   34104532 Feb 16 09:48 b.idx
-rwxrwxrwx 1 datalib services1  694598383 Feb 15 21:19 c
-rw-r--r-- 1 ycheng  services1 1658046144 Feb 16 10:14 c.sp
-rwxrwxrwx 1 datalib services1 1658046136 Feb 15 21:19 d
-rwxrwxrwx 1 datalib services1 1576780344 Feb 15 21:20 e
-rw-r--r-- 1 ycheng  services1         80 Feb 16 10:15 e.dic
-rw-r--r-- 1 ycheng  services1     385424 Feb 16 10:15 e.idx
-rwxrwxrwx 1 datalib services1  829023068 Feb 15 21:20 f
-rw-r--r-- 1 ycheng  services1   28403764 Feb 16 13:49 f.idx
-rwxrwxrwx 1 datalib services1      30652 Feb 15 21:20 f.msk
-rwxrwxrwx 1 datalib services1  355338336 Feb 15 21:20 g
-rwxrwxrwx 1 datalib services1     593292 Feb 15 21:20 g.msk
-rwxrwxrwx 1 datalib services1  292419708 Feb 15 21:20 h
-rwxrwxrwx 1 datalib services1     593292 Feb 15 21:20 h.msk
-rwxrwxrwx 1 datalib services1 1822783677 Feb 15 21:20 i
-rwxrwxrwx 1 datalib services1     593292 Feb 15 21:20 i.msk
-rwxrwxrwx 1 datalib services1  207439183 Feb 15 21:21 j
-rwxrwxrwx 1 datalib services1     593292 Feb 15 21:20 j.msk
-rwxrwxrwx 1 datalib services1       1047 Feb 15 21:21 -part.txt

<<attachment: ycheng.vcf>>

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to