Bear in mind that how many files you'll have open simultaneously is a function of number of regions, number of column families, and how compaction organizes the HBase files on disk (the strategy in effect and its parameters, the current ingest rate, and so on). You call ballpark this as such: If you have one column family in a table, and store data into all the regions, then you will have one file open on the cluster per region, or more. If you have 100,000 column families in a table, and store data into all the regions and CFs, then you will have 100,000 files open on the cluster per region, *or more*. You will run into OS and HDFS levels attempting this, I don't recommend it.
I don't think any reasonable schema design needs produce a requirement for 100,000 column *families*. You can have any number of keys with <column>:<qualifier> in a column family, varying the <qualifier> to 100,000 or 1,000,000 or more unique values is no problem. Can you say more about what you are trying to accomplish? On Sat, Dec 21, 2013 at 7:17 AM, 乃岩 <[email protected]> wrote: > Hi, > Can anybody tell me if future HBase release will integrate 3149 for > Make flush decisions per column family? > > By the way, for current HBase, if the simultaneous flush is the only > issue? I mean, to create 100000 CFs will not be a problem, right? > > Thanks in advance! > > > > > > N.Y. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
