Hi, Kishore,

I would love to see you try the new set-value feature, however, since
you are more concerned about the query performance, I will suggest
that you create the cross-product out of the set-values.

Turn

a:1,2,3  b:23,343 c:23,123 metric1:12 metric2:34

into 12 rows as follows

a:1  b:23 c:23 metric1:12 metric2:34
a:2  b:23 c:23 metric1:12 metric2:34
a:3  b:23 c:23 metric1:12 metric2:34
a:1  b:343 c:23 metric1:12 metric2:34
a:2  b:343 c:23 metric1:12 metric2:34
a:3  b:343 c:23 metric1:12 metric2:34
a:1  b:23 c:123 metric1:12 metric2:34
a:2  b:23 c:123 metric1:12 metric2:34
a:3  b:23 c:123 metric1:12 metric2:34
a:1  b:343 c:123 metric1:12 metric2:34
a:2  b:343 c:123 metric1:12 metric2:34
a:3  b:343 c:123 metric1:12 metric2:34

In this form, all a, b, and c are simple integers. Queries on them
should be pretty fast.

John



On 8/10/12 12:48 PM, kishore g wrote:
> Thanks a lot. A follow up question on this is how will this impact
> performance and if there is an overhead of maintaining additional
> dictionary. Because we go through the pain of converting actual
> strings into int to speed up performance and avoid dictionary over
> head but converting them back into string does look like a good option
> 
> thanks,
> Kishore G
> 
> On Fri, Aug 10, 2012 at 11:25 AM, K. John Wu <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi, Kishore,
> 
>     Funny thing you should ask this.  We have just implemented something
>     that might do it for you.  Here is how it might go.
> 
>     The way I interpret the row of data you have is as follows,
> 
>     The column named 'a' contains a set of integers: 1, 2, and 3.
>     The column named 'b' contains a set of integers: 23, and 343.
>     The column named 'c' contains a set of integers: 23, and 123.
> 
>     Currently, the only way to support a column with set-values (i.e.,
>     each value is a set of something, as oppose to a simple value) is to
>     treat the set as a string.  We have to treat it as a string because it
>     is the only type that support variable-length data.  If you tell
>     FastBit columns 'a', 'b', and 'c' are ibis::TEXT and then tell ibis to
>     build indexes with the following option
> 
>     keywords delimiters=','
> 
>     The above option tell ibis.cpp to build keyword indexes for the text
>     data and sets the delimiter to be coma ','.
> 
>     You need to modify your query expression a little bit, "a = 1" needs
>     to be "a contains '1'".  Literally, this means the value '1' is in the
>     set represented by 'a'.  The quote around 1 is necessary because it
>     would otherwise be interpreted as an integer.
> 
>     "Why do I have to go through this much trouble?" you might ask.  Well,
>     the problem is that SQL does not support set-values.  In fact, it does
>     not support any sort of composite values.  This is why you have hide
>     set-values as arbitrary text.  If you don't want to go through the
>     above trouble, you might consider make a cross products of all 12
>     possible combinations that is expressed in the single row in your
>     example.  Expressing the single as 12 rows might still give you faster
>     operations because FastBit will be able to treat integers properly as
>     integers instead of as strings.
> 
>     Hope this helps.
> 
>     John
> 
> 
> 
>     PS: one way to tell ibis.cpp to build indexes for columns 'a', 'b',
>     'c' with the above mentioned indexing option is to use the following
>     command line option
> 
>     -b "a:keywords delimiters=','" -b "b:keywords delimiters=','" -b
>     "c:keywords delimiters=','"
> 
>     or
> 
>     -b "?:keywords delimiters=','"
> 
>     (the above option assume that columns 'a', 'b', and 'c' are the only
>     ones with a single-letter column names)
> 
> 
>     On 8/10/12 8:50 AM, kishore g wrote:
>     > Hi,
>     >
>     > Is there an example on how to set multiple values for a column
>     for one
>     > row.
>     >
>     > For example
>     >
>     > a:1,2,3  b:23,343 c:23,123 metric1:12 metric2:34
>     >
>     > The following queries should include this row in the result
>     >
>     > select from table where a=1
>     > select from table where a=2
>     > select from table where a=1 and b=23
>     > select from table where a=1 or b=99 or c=89
>     >
>     > thanks,
>     > Kishore G
>     >
>     >
>     > _______________________________________________
>     > FastBit-users mailing list
>     > [email protected] <mailto:[email protected]>
>     > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>     >
> 
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to