Re: [FastBit-users] Api to handle multi valued column in one row

kishore g Fri, 10 Aug 2012 13:29:20 -0700

Hi John,

I actually prefer the column as text with delimiter approach instead of
creating the cross product. I want to understand why would there be a
performance impact. Will it impact all types of queries or only few. My
understanding is if i need to only get the number of hits( based on where
clause) and if aggregation functions( sum, count(*)) are on single values
columns there should be no performance impact.


Please correct me if my understanding is correct.

Thanks,
Kishore G

On Fri, Aug 10, 2012 at 1:20 PM, K. John Wu <[email protected]> wrote:

> Hi, Kishore,
>
> I would love to see you try the new set-value feature, however, since
> you are more concerned about the query performance, I will suggest
> that you create the cross-product out of the set-values.
>
> Turn
>
> a:1,2,3  b:23,343 c:23,123 metric1:12 metric2:34
>
> into 12 rows as follows
>
> a:1  b:23 c:23 metric1:12 metric2:34
> a:2  b:23 c:23 metric1:12 metric2:34
> a:3  b:23 c:23 metric1:12 metric2:34
> a:1  b:343 c:23 metric1:12 metric2:34
> a:2  b:343 c:23 metric1:12 metric2:34
> a:3  b:343 c:23 metric1:12 metric2:34
> a:1  b:23 c:123 metric1:12 metric2:34
> a:2  b:23 c:123 metric1:12 metric2:34
> a:3  b:23 c:123 metric1:12 metric2:34
> a:1  b:343 c:123 metric1:12 metric2:34
> a:2  b:343 c:123 metric1:12 metric2:34
> a:3  b:343 c:123 metric1:12 metric2:34
>
> In this form, all a, b, and c are simple integers. Queries on them
> should be pretty fast.
>
> John
>
>
>
> On 8/10/12 12:48 PM, kishore g wrote:
> > Thanks a lot. A follow up question on this is how will this impact
> > performance and if there is an overhead of maintaining additional
> > dictionary. Because we go through the pain of converting actual
> > strings into int to speed up performance and avoid dictionary over
> > head but converting them back into string does look like a good option
> >
> > thanks,
> > Kishore G
> >
> > On Fri, Aug 10, 2012 at 11:25 AM, K. John Wu <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Hi, Kishore,
> >
> >     Funny thing you should ask this.  We have just implemented something
> >     that might do it for you.  Here is how it might go.
> >
> >     The way I interpret the row of data you have is as follows,
> >
> >     The column named 'a' contains a set of integers: 1, 2, and 3.
> >     The column named 'b' contains a set of integers: 23, and 343.
> >     The column named 'c' contains a set of integers: 23, and 123.
> >
> >     Currently, the only way to support a column with set-values (i.e.,
> >     each value is a set of something, as oppose to a simple value) is to
> >     treat the set as a string.  We have to treat it as a string because
> it
> >     is the only type that support variable-length data.  If you tell
> >     FastBit columns 'a', 'b', and 'c' are ibis::TEXT and then tell ibis
> to
> >     build indexes with the following option
> >
> >     keywords delimiters=','
> >
> >     The above option tell ibis.cpp to build keyword indexes for the text
> >     data and sets the delimiter to be coma ','.
> >
> >     You need to modify your query expression a little bit, "a = 1" needs
> >     to be "a contains '1'".  Literally, this means the value '1' is in
> the
> >     set represented by 'a'.  The quote around 1 is necessary because it
> >     would otherwise be interpreted as an integer.
> >
> >     "Why do I have to go through this much trouble?" you might ask.
>  Well,
> >     the problem is that SQL does not support set-values.  In fact, it
> does
> >     not support any sort of composite values.  This is why you have hide
> >     set-values as arbitrary text.  If you don't want to go through the
> >     above trouble, you might consider make a cross products of all 12
> >     possible combinations that is expressed in the single row in your
> >     example.  Expressing the single as 12 rows might still give you
> faster
> >     operations because FastBit will be able to treat integers properly as
> >     integers instead of as strings.
> >
> >     Hope this helps.
> >
> >     John
> >
> >
> >
> >     PS: one way to tell ibis.cpp to build indexes for columns 'a', 'b',
> >     'c' with the above mentioned indexing option is to use the following
> >     command line option
> >
> >     -b "a:keywords delimiters=','" -b "b:keywords delimiters=','" -b
> >     "c:keywords delimiters=','"
> >
> >     or
> >
> >     -b "?:keywords delimiters=','"
> >
> >     (the above option assume that columns 'a', 'b', and 'c' are the only
> >     ones with a single-letter column names)
> >
> >
> >     On 8/10/12 8:50 AM, kishore g wrote:
> >     > Hi,
> >     >
> >     > Is there an example on how to set multiple values for a column
> >     for one
> >     > row.
> >     >
> >     > For example
> >     >
> >     > a:1,2,3  b:23,343 c:23,123 metric1:12 metric2:34
> >     >
> >     > The following queries should include this row in the result
> >     >
> >     > select from table where a=1
> >     > select from table where a=2
> >     > select from table where a=1 and b=23
> >     > select from table where a=1 or b=99 or c=89
> >     >
> >     > thanks,
> >     > Kishore G
> >     >
> >     >
> >     > _______________________________________________
> >     > FastBit-users mailing list
> >     > [email protected] <mailto:[email protected]>
> >     > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> >     >
> >
> >
>

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Re: [FastBit-users] Api to handle multi valued column in one row

Reply via email to