Hi, Ying,

Thanks for your interest in Fastbit.  My guess is that you are 
outputting to screen.  If this is the case, then a possible way to 
reduce the time is to redirect the output to a file, either with 
'-output filename' option or simply redirect the standard output to a 
file with '> filename' option.

If you are already doing redirection, then your problem is like due to 
slow processing of string values.  FastBit has two different ways of 
dealing with string, either as categorical values or plain text.  If 
the text field you are selecting is 'categorical' the output may be a 
little faster.

Please feel free to let us know if you have any additional questions.

John


On 2/16/2011 7:43 AM, Ying Cheng wrote:
> Hi,
>
> I am a software engineer and my name is Ying, I started to use
> fastbit1.2.2 examples/ibis.cpp to query index several months ago, my
> data set has ~2 billion rows and about 10 columns,
> I met some performance problem using select to output two columns, it
> took about 60 seconds to get the result back. If I just queried the
> count, the performance is quick. Are there any suggestion I could
> improve it? Thank you very much for your help.
>
> Regards
> Ying
>
> Query command:
> /nfs/panda/production/ena/index/fastbit-ibis1.2.2/examples/ibis -d
> /nfs/panda/production/ena/index/fastbitindex/release -query 'select
> a,b where b=9606 LIMIT 10000, 10'
> a, b (with counts)
> "AA019136", 9606, 1
> "AA019137", 9606, 1
> "AA019138", 9606, 1
> "AA019139", 9606, 1
> "AA019140", 9606, 1
> "AA019141", 9606, 1
> "AA019142", 9606, 1
> "AA019143", 9606, 1
> "AA019144", 9606, 1
> "AA019145", 9606, 1
> doQuery:: evaluate(SELECT a,b FROM release WHERE b=9606 LIMIT 10000,
> 10) produced 28878169 hits, took 64.8621 CPU seconds, 101.228 elapsed
> seconds
>
> The index structure looks like:
>
> # meta data for data partition release written by ibis::tafel::write
> on Tue Feb 15 21:19:11 2011
>
> BEGIN HEADER
> Name = release
> Description =
> //homes/ycheng/fastbit-ibis1.2.1/examples//.libs/lt-ardea -d
> /nfs/panda/production/ena/index/fastbitindex/release -m a:text, b:int,
> c:text, d:long, e:category, f:int, g:text, h:text, i:category,
> j:text -t /nfs/panda/production/ena/index/fastbit/embl.csv
> Number_of_rows = 207255767
> Number_of_columns = 10
> Timestamp = 1297804751
> END HEADER
>
> Begin Column
> name = a
> data_type = TEXT
> index=none
> End Column
>
> Begin Column
> name = b
> data_type = INT
> End Column
>
> Begin Column
> name = c
> data_type = TEXT
> index=none
> End Column
>
> Begin Column
> name = d
> data_type = LONG
> End Column
>
> Begin Column
> name = e
> data_type = CATEGORY
> End Column
>
> Begin Column
> name = f
> data_type = INT
> End Column
>
> Begin Column
> name = g
> data_type = TEXT
> index=none
> End Column
>
> Begin Column
> name = h
> data_type = TEXT
> index=none
> End Column
>
> Begin Column
> name = i
> data_type = CATEGORY
> End Column
>
> Begin Column
> name = j
> data_type = TEXT
> index=none
> End Column
>
> And the index size is:
>
> total 17547332
> -rwxrwxrwx 1 datalib services1 2161017944 Feb 15 21:19 a
> -rw-r--r-- 1 ycheng services1 1658046144 Feb 16 10:13 a.sp
> -rwxrwxrwx 1 datalib services1 829023068 Feb 15 21:19 b
> -rwxrwxrwx 1 ycheng services1 34104532 Feb 16 09:48 b.idx
> -rwxrwxrwx 1 datalib services1 694598383 Feb 15 21:19 c
> -rw-r--r-- 1 ycheng services1 1658046144 Feb 16 10:14 c.sp
> -rwxrwxrwx 1 datalib services1 1658046136 Feb 15 21:19 d
> -rwxrwxrwx 1 datalib services1 1576780344 Feb 15 21:20 e
> -rw-r--r-- 1 ycheng services1 80 Feb 16 10:15 e.dic
> -rw-r--r-- 1 ycheng services1 385424 Feb 16 10:15 e.idx
> -rwxrwxrwx 1 datalib services1 829023068 Feb 15 21:20 f
> -rw-r--r-- 1 ycheng services1 28403764 Feb 16 13:49 f.idx
> -rwxrwxrwx 1 datalib services1 30652 Feb 15 21:20 f.msk
> -rwxrwxrwx 1 datalib services1 355338336 Feb 15 21:20 g
> -rwxrwxrwx 1 datalib services1 593292 Feb 15 21:20 g.msk
> -rwxrwxrwx 1 datalib services1 292419708 Feb 15 21:20 h
> -rwxrwxrwx 1 datalib services1 593292 Feb 15 21:20 h.msk
> -rwxrwxrwx 1 datalib services1 1822783677 Feb 15 21:20 i
> -rwxrwxrwx 1 datalib services1 593292 Feb 15 21:20 i.msk
> -rwxrwxrwx 1 datalib services1 207439183 Feb 15 21:21 j
> -rwxrwxrwx 1 datalib services1 593292 Feb 15 21:20 j.msk
> -rwxrwxrwx 1 datalib services1 1047 Feb 15 21:21 -part.txt
>
>
>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to