Hi, John, Since the table with high-cardinality attributes would use bins, let's take a example for discussing. If we have two tables like below,
table1 column c1 bin0 : value range [0.0-0.3) bin1 : value range [0.3-0.6) bin2 : value range [0.6-0.9) bin3 : value range [0.9-1.2) table2 column d2 bin0 : value range [0.1-0.4) bin1 : value range [0.4-0.7) bin2 : value range [0.7-1.0) How do they do such a query "select table1.c1 as x, table1.c2, table2.d2 from table1 join table2 on c1 = d1" if both c1, d1 have high-cardinality attributes? Thanks, Min On Fri, Mar 19, 2010 at 11:54 PM, K. John Wu <[email protected]> wrote: > Hi, Min, > > I am somewhat unsure of what operations you are referring to by > "high-cardinality table join." The following is a quick description > of the binning strategy. Please clarify your question and I will give > it another try to answer it.. > > John > > ---------------------- > One can explicitly tell FastBit to bin any numerical values by using > an indexing specification containing a <binninb .../> directive. > However, if you neglect to specify an explicit directive, here is what > happens. > > - for integer values, if the difference between the min and max is > less than 1000 or less than 10% of the number of rows, then each > distinct value will get its own bin (i.e., no binning). Otherwise, a > default binning strategy is used. > > - for floating-point values, the default binning strategy is used > > - the default binning strategy samples the current values, build an > exact histogram on the sampled values, divide the histogram into a > certain number of bins, typically around 10,000 bins. We call this > approximate equal-weight bins. > > > > On 3/19/2010 3:57 AM, Min Zhou wrote: >> Hi all, >> Can anyone give me a description on the implementation fastbit deal >> with high-cardinality table join? >> Does it use binning? How do they join? >> >> >> Thanks, >> Min > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
