Hi John, I actually prefer the column as text with delimiter approach instead of creating the cross product. I want to understand why would there be a performance impact. Will it impact all types of queries or only few. My understanding is if i need to only get the number of hits( based on where clause) and if aggregation functions( sum, count(*)) are on single values columns there should be no performance impact.
Please correct me if my understanding is correct. Thanks, Kishore G On Fri, Aug 10, 2012 at 1:20 PM, K. John Wu <[email protected]> wrote: > Hi, Kishore, > > I would love to see you try the new set-value feature, however, since > you are more concerned about the query performance, I will suggest > that you create the cross-product out of the set-values. > > Turn > > a:1,2,3 b:23,343 c:23,123 metric1:12 metric2:34 > > into 12 rows as follows > > a:1 b:23 c:23 metric1:12 metric2:34 > a:2 b:23 c:23 metric1:12 metric2:34 > a:3 b:23 c:23 metric1:12 metric2:34 > a:1 b:343 c:23 metric1:12 metric2:34 > a:2 b:343 c:23 metric1:12 metric2:34 > a:3 b:343 c:23 metric1:12 metric2:34 > a:1 b:23 c:123 metric1:12 metric2:34 > a:2 b:23 c:123 metric1:12 metric2:34 > a:3 b:23 c:123 metric1:12 metric2:34 > a:1 b:343 c:123 metric1:12 metric2:34 > a:2 b:343 c:123 metric1:12 metric2:34 > a:3 b:343 c:123 metric1:12 metric2:34 > > In this form, all a, b, and c are simple integers. Queries on them > should be pretty fast. > > John > > > > On 8/10/12 12:48 PM, kishore g wrote: > > Thanks a lot. A follow up question on this is how will this impact > > performance and if there is an overhead of maintaining additional > > dictionary. Because we go through the pain of converting actual > > strings into int to speed up performance and avoid dictionary over > > head but converting them back into string does look like a good option > > > > thanks, > > Kishore G > > > > On Fri, Aug 10, 2012 at 11:25 AM, K. John Wu <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi, Kishore, > > > > Funny thing you should ask this. We have just implemented something > > that might do it for you. Here is how it might go. > > > > The way I interpret the row of data you have is as follows, > > > > The column named 'a' contains a set of integers: 1, 2, and 3. > > The column named 'b' contains a set of integers: 23, and 343. > > The column named 'c' contains a set of integers: 23, and 123. > > > > Currently, the only way to support a column with set-values (i.e., > > each value is a set of something, as oppose to a simple value) is to > > treat the set as a string. We have to treat it as a string because > it > > is the only type that support variable-length data. If you tell > > FastBit columns 'a', 'b', and 'c' are ibis::TEXT and then tell ibis > to > > build indexes with the following option > > > > keywords delimiters=',' > > > > The above option tell ibis.cpp to build keyword indexes for the text > > data and sets the delimiter to be coma ','. > > > > You need to modify your query expression a little bit, "a = 1" needs > > to be "a contains '1'". Literally, this means the value '1' is in > the > > set represented by 'a'. The quote around 1 is necessary because it > > would otherwise be interpreted as an integer. > > > > "Why do I have to go through this much trouble?" you might ask. > Well, > > the problem is that SQL does not support set-values. In fact, it > does > > not support any sort of composite values. This is why you have hide > > set-values as arbitrary text. If you don't want to go through the > > above trouble, you might consider make a cross products of all 12 > > possible combinations that is expressed in the single row in your > > example. Expressing the single as 12 rows might still give you > faster > > operations because FastBit will be able to treat integers properly as > > integers instead of as strings. > > > > Hope this helps. > > > > John > > > > > > > > PS: one way to tell ibis.cpp to build indexes for columns 'a', 'b', > > 'c' with the above mentioned indexing option is to use the following > > command line option > > > > -b "a:keywords delimiters=','" -b "b:keywords delimiters=','" -b > > "c:keywords delimiters=','" > > > > or > > > > -b "?:keywords delimiters=','" > > > > (the above option assume that columns 'a', 'b', and 'c' are the only > > ones with a single-letter column names) > > > > > > On 8/10/12 8:50 AM, kishore g wrote: > > > Hi, > > > > > > Is there an example on how to set multiple values for a column > > for one > > > row. > > > > > > For example > > > > > > a:1,2,3 b:23,343 c:23,123 metric1:12 metric2:34 > > > > > > The following queries should include this row in the result > > > > > > select from table where a=1 > > > select from table where a=2 > > > select from table where a=1 and b=23 > > > select from table where a=1 or b=99 or c=89 > > > > > > thanks, > > > Kishore G > > > > > > > > > _______________________________________________ > > > FastBit-users mailing list > > > [email protected] <mailto:[email protected]> > > > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > > > > > > > >
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
