Hi, Kishore,

I have not timed the queries yet, however, queries involving strings
will have to go through string comparisons, which is usually more
expensive than integer comparisons.

Because the set-values are treated as text, it has to be mapped
internal integer representation through a dictionary.  Every time,
ibis command-line tool is started, it will have to read the dictionary
into memory.  If you have a relatively large dictionary, then reading
this dictionary might take more time than actually processing the queries.

John


On 8/10/12 1:28 PM, kishore g wrote:
> Hi John,
> 
> I actually prefer the column as text with delimiter approach instead
> of creating the cross product. I want to understand why would there be
> a performance impact. Will it impact all types of queries or only few.
> My understanding is if i need to only get the number of hits( based on
> where clause) and if aggregation functions( sum, count(*)) are on
> single values columns there should be no performance impact.
> 
> Please correct me if my understanding is correct.
> 
> Thanks,
> Kishore G
> 
> On Fri, Aug 10, 2012 at 1:20 PM, K. John Wu <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi, Kishore,
> 
>     I would love to see you try the new set-value feature, however, since
>     you are more concerned about the query performance, I will suggest
>     that you create the cross-product out of the set-values.
> 
>     Turn
> 
>     a:1,2,3  b:23,343 c:23,123 metric1:12 metric2:34
> 
>     into 12 rows as follows
> 
>     a:1  b:23 c:23 metric1:12 metric2:34
>     a:2  b:23 c:23 metric1:12 metric2:34
>     a:3  b:23 c:23 metric1:12 metric2:34
>     a:1  b:343 c:23 metric1:12 metric2:34
>     a:2  b:343 c:23 metric1:12 metric2:34
>     a:3  b:343 c:23 metric1:12 metric2:34
>     a:1  b:23 c:123 metric1:12 metric2:34
>     a:2  b:23 c:123 metric1:12 metric2:34
>     a:3  b:23 c:123 metric1:12 metric2:34
>     a:1  b:343 c:123 metric1:12 metric2:34
>     a:2  b:343 c:123 metric1:12 metric2:34
>     a:3  b:343 c:123 metric1:12 metric2:34
> 
>     In this form, all a, b, and c are simple integers. Queries on them
>     should be pretty fast.
> 
>     John
> 
> 
> 
>     On 8/10/12 12:48 PM, kishore g wrote:
>     > Thanks a lot. A follow up question on this is how will this impact
>     > performance and if there is an overhead of maintaining additional
>     > dictionary. Because we go through the pain of converting actual
>     > strings into int to speed up performance and avoid dictionary over
>     > head but converting them back into string does look like a good
>     option
>     >
>     > thanks,
>     > Kishore G
>     >
>     > On Fri, Aug 10, 2012 at 11:25 AM, K. John Wu <[email protected]
>     <mailto:[email protected]>
>     > <mailto:[email protected] <mailto:[email protected]>>> wrote:
>     >
>     >     Hi, Kishore,
>     >
>     >     Funny thing you should ask this.  We have just implemented
>     something
>     >     that might do it for you.  Here is how it might go.
>     >
>     >     The way I interpret the row of data you have is as follows,
>     >
>     >     The column named 'a' contains a set of integers: 1, 2, and 3.
>     >     The column named 'b' contains a set of integers: 23, and 343.
>     >     The column named 'c' contains a set of integers: 23, and 123.
>     >
>     >     Currently, the only way to support a column with set-values
>     (i.e.,
>     >     each value is a set of something, as oppose to a simple
>     value) is to
>     >     treat the set as a string.  We have to treat it as a string
>     because it
>     >     is the only type that support variable-length data.  If you tell
>     >     FastBit columns 'a', 'b', and 'c' are ibis::TEXT and then
>     tell ibis to
>     >     build indexes with the following option
>     >
>     >     keywords delimiters=','
>     >
>     >     The above option tell ibis.cpp to build keyword indexes for
>     the text
>     >     data and sets the delimiter to be coma ','.
>     >
>     >     You need to modify your query expression a little bit, "a =
>     1" needs
>     >     to be "a contains '1'".  Literally, this means the value '1'
>     is in the
>     >     set represented by 'a'.  The quote around 1 is necessary
>     because it
>     >     would otherwise be interpreted as an integer.
>     >
>     >     "Why do I have to go through this much trouble?" you might
>     ask.  Well,
>     >     the problem is that SQL does not support set-values.  In
>     fact, it does
>     >     not support any sort of composite values.  This is why you
>     have hide
>     >     set-values as arbitrary text.  If you don't want to go
>     through the
>     >     above trouble, you might consider make a cross products of
>     all 12
>     >     possible combinations that is expressed in the single row in
>     your
>     >     example.  Expressing the single as 12 rows might still give
>     you faster
>     >     operations because FastBit will be able to treat integers
>     properly as
>     >     integers instead of as strings.
>     >
>     >     Hope this helps.
>     >
>     >     John
>     >
>     >
>     >
>     >     PS: one way to tell ibis.cpp to build indexes for columns
>     'a', 'b',
>     >     'c' with the above mentioned indexing option is to use the
>     following
>     >     command line option
>     >
>     >     -b "a:keywords delimiters=','" -b "b:keywords delimiters=','" -b
>     >     "c:keywords delimiters=','"
>     >
>     >     or
>     >
>     >     -b "?:keywords delimiters=','"
>     >
>     >     (the above option assume that columns 'a', 'b', and 'c' are
>     the only
>     >     ones with a single-letter column names)
>     >
>     >
>     >     On 8/10/12 8:50 AM, kishore g wrote:
>     >     > Hi,
>     >     >
>     >     > Is there an example on how to set multiple values for a column
>     >     for one
>     >     > row.
>     >     >
>     >     > For example
>     >     >
>     >     > a:1,2,3  b:23,343 c:23,123 metric1:12 metric2:34
>     >     >
>     >     > The following queries should include this row in the result
>     >     >
>     >     > select from table where a=1
>     >     > select from table where a=2
>     >     > select from table where a=1 and b=23
>     >     > select from table where a=1 or b=99 or c=89
>     >     >
>     >     > thanks,
>     >     > Kishore G
>     >     >
>     >     >
>     >     > _______________________________________________
>     >     > FastBit-users mailing list
>     >     > [email protected]
>     <mailto:[email protected]>
>     <mailto:[email protected]
>     <mailto:[email protected]>>
>     >     > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>     >     >
>     >
>     >
> 
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to