Hi, Mark,

There is no limit imposed by FastBit on the number of columns in a
table.  One practical limitation might be how many files can be kept
open while creating the data partitions.  On most file systems, one
should be able to keep 1000 files open at the same time to write 1000
columns.  Beyond that, you will have to generate some subset of
columns at a time.

There is no problem having a few hundred of columns. We have worked on
applications with 500 columns, and there is a commercial product using
FastBit with 100 or so columns.  We should be fine..

John


On 5/30/13 12:25 PM, Mark Hansen wrote:
> John,
> 
> Thanks for the quick response.  Actually, I am talking about the number of
> columns in the table.  Individual queries will access a subset of these
> columns - maybe up to 12 at most.
> 
> Is having 100 columns and 100 million rows a problem?
> 
> -- Mark
> 
> 
> On 5/30/13 2:58 PM, "K. John Wu" <[email protected]> wrote:
> 
>> Hi, Mark,
>>
>> Thanks for your interest in FastBit software.
>> Looks like you are talking about number of columns in a query
>> expression.  In this case, the practical limitation would be memory
>> required to hold the columns in core.  A rough estimate goes like
>> this, if your query selects 1 million rows (before group-by
>> operations) and 100 columns, then you should be prepared to store 100
>> million values in memory.  For some group by operations, a second copy
>> of the data is generated, which can double the space requirement.
>>
>> If your group by operations can be processed one data partition at a
>> time, less memory might be needed, but the above back of the envelop
>> number is useful to keep in mind.
>>
>> John
>>
>>
>> On 5/30/13 11:47 AM, Mark Hansen wrote:
>>> I'm wondering if there is any practical limit on the number of columns
>>> in a FastBit "table"?
>>>
>>> My company is looking at FastBit as a potential backend for an
>>> analytics engine.  We would have one big table with about 100 columns
>>> - 30 of which need to be indexed for searching.  We call these the
>>> "dimensions".  The other 70 - the "metrics" - do not need to be
>>> indexed, but might need some aggregation like SUM, AVG, STDEV, etc.
>>> The data is going to be pretty sparse - typically 3-4 "dimensions" are
>>> populated and 8-12 "metrics" per row, with the rest of the columns set
>>> to NULL.
>>>
>>> Queries will be typically like:
>>>
>>> SELECT d1, d5, d18, m2, m19, m29 FROM TABLE where d1 between
>>> lowerBound and upperBound
>>>
>>> We are looking at about 100 columns and 100 MILLION rows.  A few of
>>> the dimension columns (that need to be indexed) contain arbitrary
>>> length text values.
>>>
>>> We could partition the table and decrease the rows to about 10 MILLION
>>> if necessary.
>>>
>>> Does this sound like something that FastBit is well suited for?
>>>
>>> -- Mark
>>>
>>> -- 
>>> Mark Hansen
>>> Founder & President
>>> Digital Brand Mine | 708 3rd Ave | New York, New York 10017
>>> office: 212-961-7250
>>> cell: 914-924-3398
>>> http://digitalbrandmine.com/ | email: [email protected]
>>> <mailto:[email protected]>
>>>
>>>
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to