Hi, John, Very appreciate for your reply.
Min On Tue, Mar 23, 2010 at 1:58 AM, K. John Wu <[email protected]> wrote: > Hi, Min, > > Thanks for clarify the question. Since the bins do NOT match, they > are not currently used to resolve join conditions. There are some > ways of using these indexes to resolve join conditions anyway, but we > are not currently doing that. > > John > > > On 3/19/2010 8:09 PM, Min Zhou wrote: >> Hi, John, >> >> Since the table with high-cardinality attributes would use bins, >> let's take a example for discussing. If we have two tables like below, >> >> table1 column c1 >> bin0 : value range [0.0-0.3) >> bin1 : value range [0.3-0.6) >> bin2 : value range [0.6-0.9) >> bin3 : value range [0.9-1.2) >> >> table2 column d2 >> bin0 : value range [0.1-0.4) >> bin1 : value range [0.4-0.7) >> bin2 : value range [0.7-1.0) >> >> >> How do they do such a query "select table1.c1 as x, table1.c2, >> table2.d2 from table1 join table2 on c1 = d1" if both c1, d1 have >> high-cardinality attributes? >> >> Thanks, >> Min >> >> >> >> >> On Fri, Mar 19, 2010 at 11:54 PM, K. John Wu<[email protected]> wrote: >>> Hi, Min, >>> >>> I am somewhat unsure of what operations you are referring to by >>> "high-cardinality table join." The following is a quick description >>> of the binning strategy. Please clarify your question and I will give >>> it another try to answer it.. >>> >>> John >>> >>> ---------------------- >>> One can explicitly tell FastBit to bin any numerical values by using >>> an indexing specification containing a<binninb .../> directive. >>> However, if you neglect to specify an explicit directive, here is what >>> happens. >>> >>> - for integer values, if the difference between the min and max is >>> less than 1000 or less than 10% of the number of rows, then each >>> distinct value will get its own bin (i.e., no binning). Otherwise, a >>> default binning strategy is used. >>> >>> - for floating-point values, the default binning strategy is used >>> >>> - the default binning strategy samples the current values, build an >>> exact histogram on the sampled values, divide the histogram into a >>> certain number of bins, typically around 10,000 bins. We call this >>> approximate equal-weight bins. >>> >>> >>> >>> On 3/19/2010 3:57 AM, Min Zhou wrote: >>>> Hi all, >>>> Can anyone give me a description on the implementation fastbit deal >>>> with high-cardinality table join? >>>> Does it use binning? How do they join? >>>> >>>> >>>> Thanks, >>>> Min >>> _______________________________________________ >>> FastBit-users mailing list >>> [email protected] >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>> >> >> >> > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
