Hi John & Kishore,
I think it would be useful to be able to have columns that are set of 
(signed/unsigned integer types, float, double, category).  These could be 
stored as strings, but indexed according to the member type.  You would need to 
specify a subtype in the --part.txt along with the delimiter (defaults to 
double and ',').  I haven't looked at the code, but I imagine it wouldn't be 
too hard to construct the usual bitmap index from a list of lists rather than 
single values.  Then, queries that access the bitmap index should work out of 
the box.
Andrew


On Aug 10, 2012, at 4:29 PM, kishore g wrote:

Hi John,

I actually prefer the column as text with delimiter approach instead of 
creating the cross product. I want to understand why would there be a 
performance impact. Will it impact all types of queries or only few. My 
understanding is if i need to only get the number of hits( based on where 
clause) and if aggregation functions( sum, count(*)) are on single values 
columns there should be no performance impact.

Please correct me if my understanding is correct.

Thanks,
Kishore G

On Fri, Aug 10, 2012 at 1:20 PM, K. John Wu <[email protected]<mailto:[email protected]>> 
wrote:
Hi, Kishore,

I would love to see you try the new set-value feature, however, since
you are more concerned about the query performance, I will suggest
that you create the cross-product out of the set-values.

Turn

a:1,2,3  b:23,343 c:23,123 metric1:12 metric2:34

into 12 rows as follows

a:1  b:23 c:23 metric1:12 metric2:34
a:2  b:23 c:23 metric1:12 metric2:34
a:3  b:23 c:23 metric1:12 metric2:34
a:1  b:343 c:23 metric1:12 metric2:34
a:2  b:343 c:23 metric1:12 metric2:34
a:3  b:343 c:23 metric1:12 metric2:34
a:1  b:23 c:123 metric1:12 metric2:34
a:2  b:23 c:123 metric1:12 metric2:34
a:3  b:23 c:123 metric1:12 metric2:34
a:1  b:343 c:123 metric1:12 metric2:34
a:2  b:343 c:123 metric1:12 metric2:34
a:3  b:343 c:123 metric1:12 metric2:34

In this form, all a, b, and c are simple integers. Queries on them
should be pretty fast.

John



On 8/10/12 12:48 PM, kishore g wrote:
> Thanks a lot. A follow up question on this is how will this impact
> performance and if there is an overhead of maintaining additional
> dictionary. Because we go through the pain of converting actual
> strings into int to speed up performance and avoid dictionary over
> head but converting them back into string does look like a good option
>
> thanks,
> Kishore G
>
> On Fri, Aug 10, 2012 at 11:25 AM, K. John Wu 
> <[email protected]<mailto:[email protected]>
> <mailto:[email protected]<mailto:[email protected]>>> wrote:
>
>     Hi, Kishore,
>
>     Funny thing you should ask this.  We have just implemented something
>     that might do it for you.  Here is how it might go.
>
>     The way I interpret the row of data you have is as follows,
>
>     The column named 'a' contains a set of integers: 1, 2, and 3.
>     The column named 'b' contains a set of integers: 23, and 343.
>     The column named 'c' contains a set of integers: 23, and 123.
>
>     Currently, the only way to support a column with set-values (i.e.,
>     each value is a set of something, as oppose to a simple value) is to
>     treat the set as a string.  We have to treat it as a string because it
>     is the only type that support variable-length data.  If you tell
>     FastBit columns 'a', 'b', and 'c' are ibis::TEXT and then tell ibis to
>     build indexes with the following option
>
>     keywords delimiters=','
>
>     The above option tell ibis.cpp to build keyword indexes for the text
>     data and sets the delimiter to be coma ','.
>
>     You need to modify your query expression a little bit, "a = 1" needs
>     to be "a contains '1'".  Literally, this means the value '1' is in the
>     set represented by 'a'.  The quote around 1 is necessary because it
>     would otherwise be interpreted as an integer.
>
>     "Why do I have to go through this much trouble?" you might ask.  Well,
>     the problem is that SQL does not support set-values.  In fact, it does
>     not support any sort of composite values.  This is why you have hide
>     set-values as arbitrary text.  If you don't want to go through the
>     above trouble, you might consider make a cross products of all 12
>     possible combinations that is expressed in the single row in your
>     example.  Expressing the single as 12 rows might still give you faster
>     operations because FastBit will be able to treat integers properly as
>     integers instead of as strings.
>
>     Hope this helps.
>
>     John
>
>
>
>     PS: one way to tell ibis.cpp to build indexes for columns 'a', 'b',
>     'c' with the above mentioned indexing option is to use the following
>     command line option
>
>     -b "a:keywords delimiters=','" -b "b:keywords delimiters=','" -b
>     "c:keywords delimiters=','"
>
>     or
>
>     -b "?:keywords delimiters=','"
>
>     (the above option assume that columns 'a', 'b', and 'c' are the only
>     ones with a single-letter column names)
>
>
>     On 8/10/12 8:50 AM, kishore g wrote:
>     > Hi,
>     >
>     > Is there an example on how to set multiple values for a column
>     for one
>     > row.
>     >
>     > For example
>     >
>     > a:1,2,3  b:23,343 c:23,123 metric1:12 metric2:34
>     >
>     > The following queries should include this row in the result
>     >
>     > select from table where a=1
>     > select from table where a=2
>     > select from table where a=1 and b=23
>     > select from table where a=1 or b=99 or c=89
>     >
>     > thanks,
>     > Kishore G
>     >
>     >
>     > _______________________________________________
>     > FastBit-users mailing list
>     > [email protected]<mailto:[email protected]> 
> <mailto:[email protected]<mailto:[email protected]>>
>     > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>     >
>
>

_______________________________________________
FastBit-users mailing list
[email protected]<mailto:[email protected]>
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to