Hi, John,

Very appreciate for your reply.

Min

On Tue, Mar 23, 2010 at 1:58 AM, K. John Wu <[email protected]> wrote:
> Hi, Min,
>
> Thanks for clarify the question.  Since the bins do NOT match, they
> are not currently used to resolve join conditions.  There are some
> ways of using these indexes to resolve join conditions anyway, but we
> are not currently doing that.
>
> John
>
>
> On 3/19/2010 8:09 PM, Min Zhou wrote:
>> Hi, John,
>>
>> Since the table with high-cardinality attributes would use bins,
>> let's take a example for discussing. If we have two tables like below,
>>
>> table1 column c1
>> bin0 : value range [0.0-0.3)
>> bin1 : value range [0.3-0.6)
>> bin2 : value range [0.6-0.9)
>> bin3 : value range [0.9-1.2)
>>
>> table2 column d2
>> bin0 : value range [0.1-0.4)
>> bin1 : value range [0.4-0.7)
>> bin2 : value range [0.7-1.0)
>>
>>
>> How do they do such a query "select table1.c1 as x, table1.c2,
>> table2.d2 from table1 join table2 on c1 = d1" if both c1, d1 have
>> high-cardinality attributes?
>>
>> Thanks,
>> Min
>>
>>
>>
>>
>> On Fri, Mar 19, 2010 at 11:54 PM, K. John Wu<[email protected]>  wrote:
>>> Hi, Min,
>>>
>>> I am somewhat unsure of what operations you are referring to by
>>> "high-cardinality table join."  The following is a quick description
>>> of the binning strategy.  Please clarify your question and I will give
>>> it another try to answer it..
>>>
>>> John
>>>
>>> ----------------------
>>> One can explicitly tell FastBit to bin any numerical values by using
>>> an indexing specification containing a<binninb .../>  directive.
>>> However, if you neglect to specify an explicit directive, here is what
>>> happens.
>>>
>>> - for integer values, if the difference between the min and max is
>>> less than 1000 or less than 10% of the number of rows, then each
>>> distinct value will get its own bin (i.e., no binning).  Otherwise, a
>>> default binning strategy is used.
>>>
>>> - for floating-point values, the default binning strategy is used
>>>
>>> - the default binning strategy samples the current values, build an
>>> exact histogram on the sampled values, divide the histogram into a
>>> certain number of bins, typically around 10,000 bins.  We call this
>>> approximate equal-weight bins.
>>>
>>>
>>>
>>> On 3/19/2010 3:57 AM, Min Zhou wrote:
>>>> Hi all,
>>>> Can anyone give me a description on the implementation fastbit deal
>>>> with high-cardinality table join?
>>>> Does it use binning? How do they join?
>>>>
>>>>
>>>> Thanks,
>>>> Min
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
>>
>>
>>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to