Hi, John,

Since the table with high-cardinality attributes would use bins,
let's take a example for discussing. If we have two tables like below,

table1 column c1
bin0 : value range [0.0-0.3)
bin1 : value range [0.3-0.6)
bin2 : value range [0.6-0.9)
bin3 : value range [0.9-1.2)

table2 column d2
bin0 : value range [0.1-0.4)
bin1 : value range [0.4-0.7)
bin2 : value range [0.7-1.0)


How do they do such a query "select table1.c1 as x, table1.c2,
table2.d2 from table1 join table2 on c1 = d1" if both c1, d1 have
high-cardinality attributes?

Thanks,
Min




On Fri, Mar 19, 2010 at 11:54 PM, K. John Wu <[email protected]> wrote:
> Hi, Min,
>
> I am somewhat unsure of what operations you are referring to by
> "high-cardinality table join."  The following is a quick description
> of the binning strategy.  Please clarify your question and I will give
> it another try to answer it..
>
> John
>
> ----------------------
> One can explicitly tell FastBit to bin any numerical values by using
> an indexing specification containing a <binninb .../> directive.
> However, if you neglect to specify an explicit directive, here is what
> happens.
>
> - for integer values, if the difference between the min and max is
> less than 1000 or less than 10% of the number of rows, then each
> distinct value will get its own bin (i.e., no binning).  Otherwise, a
> default binning strategy is used.
>
> - for floating-point values, the default binning strategy is used
>
> - the default binning strategy samples the current values, build an
> exact histogram on the sampled values, divide the histogram into a
> certain number of bins, typically around 10,000 bins.  We call this
> approximate equal-weight bins.
>
>
>
> On 3/19/2010 3:57 AM, Min Zhou wrote:
>> Hi all,
>> Can anyone give me a description on the implementation fastbit deal
>> with high-cardinality table join?
>> Does it use binning? How do they join?
>>
>>
>> Thanks,
>> Min
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to