How to avoid stop-the-world GC for HBase Region Server under big heap size
Hi, We are running Region Server on big memory machine (70G) and set Xmx=64G. Most heap is used as block cache for random read. Stop-the-world GC is killing the region server, but using less heap (16G) doesn't utilize our machines well. Is there a concurrent or parallel GC option that won't block all threads? Any thought is appreciated. Thanks. Gen Liu
Re: How to avoid stop-the-world GC for HBase Region Server under big heap size
Slab cache might help http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ ./zahoor On Thu, Aug 23, 2012 at 11:36 AM, Gen Liu ge...@zynga.com wrote: Hi, We are running Region Server on big memory machine (70G) and set Xmx=64G. Most heap is used as block cache for random read. Stop-the-world GC is killing the region server, but using less heap (16G) doesn't utilize our machines well. Is there a concurrent or parallel GC option that won't block all threads? Any thought is appreciated. Thanks. Gen Liu
client cache for all region server information?
Hello HBase masters, I am wondering whether in current implementation, each client of HBase cache all information of region server, for example, where is region server (physical hosting machine of region server), and also cache row-key range managed by the region server. If so, two more questions, - will there be too much overhead (e.g. memory footprint) of each client? - when such information is downloaded and cached at client side, and when the information is refreshed (does it only triggered by region server change and failure to fetch such information from client -- e.g. when client use cache to access machine A for region B, but find nothing, so the client needs to refresh the information in cache to see which machine owns region B)? regards, Lin
Re: client cache for all region server information?
I think for the refresh case, client first uses the older region server derived from its cache it then connects to that older region server which responds with a failure code. and then client talks to the zookeeper and then the meta node server to find the new region server for that key. The client then reissues the original request to the new region server. Btw,Client only caches information as needed for its queries and not necessarily for 'all' region servers. Abhishek i Sent from my iPad with iMstakes On Aug 22, 2012, at 23:31, Lin Ma lin...@gmail.com wrote: Hello HBase masters, I am wondering whether in current implementation, each client of HBase cache all information of region server, for example, where is region server (physical hosting machine of region server), and also cache row-key range managed by the region server. If so, two more questions, - will there be too much overhead (e.g. memory footprint) of each client? - when such information is downloaded and cached at client side, and when the information is refreshed (does it only triggered by region server change and failure to fetch such information from client -- e.g. when client use cache to access machine A for region B, but find nothing, so the client needs to refresh the information in cache to see which machine owns region B)? regards, Lin
Re: How to avoid stop-the-world GC for HBase Region Server under big heap size
Hi, For a possible future, there is as well this to monitor: http://docs.oracle.com/javase/7/docs/technotes/guides/vm/G1.html More or less requires JDK 1.7 See HBASE-2039 Cheers, N. On Thu, Aug 23, 2012 at 8:16 AM, J Mohamed Zahoor jmo...@gmail.com wrote: Slab cache might help http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/ ./zahoor On Thu, Aug 23, 2012 at 11:36 AM, Gen Liu ge...@zynga.com wrote: Hi, We are running Region Server on big memory machine (70G) and set Xmx=64G. Most heap is used as block cache for random read. Stop-the-world GC is killing the region server, but using less heap (16G) doesn't utilize our machines well. Is there a concurrent or parallel GC option that won't block all threads? Any thought is appreciated. Thanks. Gen Liu
Re: How to query by rowKey-infix
Hi Anil, to restrict data to a certain time window I also set timerange for the scan. I'm slightly shocked about the processing time of more than 2 mins to return 225 rows. I would actually need a response in 5-10 sec. In your timestamp based filtering, do you check the timestamp as part of the row key or do you use the put timestamp (as I do)? How many rows are scanned/touched at your timestamp based filtering? Is it a full table scan where each row's key is checked against a given timestamp/timerange? My use case of obtaining data by substring comparator operates on the row key. It can't be replaced by setting the time range in my case, really. Btw. the scan is additionally restricted to a certain timerange to increase skipping of irrelevant files and thus improve performance. regards, Christian - Ursprüngliche Message - Von: anil gupta anilgupt...@gmail.com An: user@hbase.apache.org; Christian Schäfer syrious3...@yahoo.de CC: Gesendet: 20:42 Mittwoch, 22.August 2012 Betreff: Re: How to query by rowKey-infix Hi Christian, I had the similar requirements as yours. So, till now i have used timestamps for filtering the data and I would say the performance is satisfactory. Here are the results of timestamp based filtering: The table has 34 million records(average row size is 1.21 KB), in 136 seconds i get the entire result of query which had 225 rows. I am running a HBase 0.92, 8 node cluster on Vmware Hypervisor. Each node had 3.2 GB of memory, and 500 GB HDFS space. Each Hard Drive in my set-up is hosting 2 Slaves Instance(2 VM's running Datanode, NodeManager,RegionServer). I have only allocated 1200MB for RS's. I haven't done any modification in the block size of HDFS or HBase. Considering the below-par hardware configuration of cluster i feel the performance is OK and IMO it'll be better than substring comparator of column values since in substring comparator filter you are essentially doing a FULL TABLE scan. Whereas, in timerange based scan you can *Skip Store Files*. On a side note, Alex created a JIRA for enhancing the current FuzzyRowFilter to do range based filtering also. Here is the link: https://issues.apache.org/jira/browse/HBASE-6618 . You are more than welcome if you would like to chime in. HTH, Anil Gupta On Thu, Aug 9, 2012 at 1:55 PM, Christian Schäfer syrious3...@yahoo.dewrote: Nice. Thanks Alex for sharing your experiences with that custom filter implementation. Currently I'm still using key filter with substring comparator. As soon as I got a good amount of test data I will measure performance of that naiive substring filter in comparison to your fuzzy row filter. regards, Christian Von: Alex Baranau alex.barano...@gmail.com An: user@hbase.apache.org; Christian Schäfer syrious3...@yahoo.de Gesendet: 22:18 Donnerstag, 9.August 2012 Betreff: Re: How to query by rowKey-infix jfyi: documented FuzzyRowFilter usage here: http://bit.ly/OXVdbg. Will add documentation to HBase book very soon [1] Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] https://issues.apache.org/jira/browse/HBASE-6526 On Fri, Aug 3, 2012 at 6:14 PM, Alex Baranau alex.barano...@gmail.com wrote: Good! Submitted initial patch of fuzzy row key filter at https://issues.apache.org/jira/browse/HBASE-6509. You can just copy the filter class and include it in your code and use it in your setup as any other custom filter (no need to patch HBase). Please let me know if you try it out (or post your comments at HBASE-6509). Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Aug 3, 2012 at 5:23 AM, Christian Schäfer syrious3...@yahoo.de wrote: Hi Alex, thanks a lot for the hint about setting the timestamp of the put. I didn't know that this would be possible but that's solving the problem (first test was successful). So I'm really glad that I don't need to apply a filter to extract the time and so on for every row. Nevertheless I would like to see your custom filter implementation. Would be nice if you could provide it helping me to get a bit into it. And yes that helped :) regards Chris Von: Alex Baranau alex.barano...@gmail.com An: user@hbase.apache.org; Christian Schäfer syrious3...@yahoo.de Gesendet: 0:57 Freitag, 3.August 2012 Betreff: Re: How to query by rowKey-infix Hi Christian! If to put off secondary indexes and assume you are going with heavy scans, you can try two following things to make it much faster. If this is appropriate to your situation, of course. 1. Is there a more elegant way to collect rows within time range X? (Unfortunately, the date attribute is not equal to the timestamp that is stored by hbase automatically.) Can you set timestamp of the Puts to the one you have in row key? Instead of relying
Re: Hbase Shell: UnsatisfiedLinkError
2012/8/22 Stack st...@duboce.net On Wed, Aug 22, 2012 at 4:39 AM, o brbrs obr...@gmail.com wrote: Thanks for your reply. I send this issue to the user mail list, but i haven't got any reply. I have installed jdk 1.6 and hbase 0.94, and have made configuration that are said in http://hbase.apache.org/book.html#configuration. But the error continues. Suggest you go googling for an answer. This is general jruby jffi dependency issue -- our shell is jruby -- unsatisfied in your environment (For example, this link has a user running ibm's jvm which could be the cause of the missing link: http://www.digipedia.pl/usenet/thread/13899/1438/). St.Ack Thanks for your reply. I fixed the problem by changing jffi and ffi folders which are in jruby-complete-1.0.6.jar with folders which are in jruby-complete-1.0.7.jar and patching again jruby-complete-1.0.6.jar. It works. -- ... Obrbrs
HTable batch execution order
Hi, I have a question about HTable.batch(List? extends Row actions,Object[] results) API, according to java doc -The ordering of execution of the actions is not defined. Meaning if you do a Put and a Get in the same batch call, you will not necessarily be guaranteed that the Get returns what the Put had put. however my question is if I don't mix up the actions only provide Get action, do I get the result in same order in which Get was provided. e.g if I provide 3 Get with row keys [r1, r2, r3], will I get [result1, result2, result3]? Thanks Shagun Agarwal
Re: client cache for all region server information?
Thank you Abhishek, Two more comments, -- Client only caches information as needed for its queries and not necessarily for 'all' region servers. -- how did client know which region server information is necessary to be cached in current HBase implementation? -- When the client loads region server information for the first time? Did client persistent cache information at client side about region server information? regards, Lin On Thu, Aug 23, 2012 at 2:47 PM, Pamecha, Abhishek apame...@x.com wrote: I think for the refresh case, client first uses the older region server derived from its cache it then connects to that older region server which responds with a failure code. and then client talks to the zookeeper and then the meta node server to find the new region server for that key. The client then reissues the original request to the new region server. Btw,Client only caches information as needed for its queries and not necessarily for 'all' region servers. Abhishek i Sent from my iPad with iMstakes On Aug 22, 2012, at 23:31, Lin Ma lin...@gmail.com wrote: Hello HBase masters, I am wondering whether in current implementation, each client of HBase cache all information of region server, for example, where is region server (physical hosting machine of region server), and also cache row-key range managed by the region server. If so, two more questions, - will there be too much overhead (e.g. memory footprint) of each client? - when such information is downloaded and cached at client side, and when the information is refreshed (does it only triggered by region server change and failure to fetch such information from client -- e.g. when client use cache to access machine A for region B, but find nothing, so the client needs to refresh the information in cache to see which machine owns region B)? regards, Lin
Re: backup strategies
Lets say I have a huge table and I want to back it up onto system with a lot of disk space. Would this work, take all the keys and export the database in chunks by selectively picking a range. For instance if the keys are from 0-10, I would say backup key 0-5 into backup_dir_A and 50001-10 to backup_dir_B . Would the be feasible? On Wed, Aug 22, 2012 at 6:48 AM, Rita rmorgan...@gmail.com wrote: what is the typical conversion process? My biggest worry is I come from a higher version of Hbase to a lower version of Hbase, say CDH4 to CDH3U1. On Thu, Aug 16, 2012 at 7:53 AM, Paul Mackles pmack...@adobe.com wrote: Hi Rita By default, the export that ships with hbase writes KeyValue objects to a sequence file. It is a very simple app and it wouldn't be hard to roll your own export program to write to whatever format you wanted (its a very simple app). You can use the current export program as a basis and just change the output of the mapper. I will say that I spent a lot of time thinking about backups and DR and I didn't really worry much about hbase versions. The file formats for hbase don't change that often and when they do, there is usually a pretty straight-forward conversion process. Also, if you are doing something like full daily backups then I am having trouble imagining a scenario where you would need to restore from anything but the most recent backup. Depending on which version of hbase you are using, there are probably much bigger issues with using export for backups that you should worry about like being able to restore in a timely fashion, preserving deletes and impact of the backup procress on your SLA. Paul On 8/16/12 7:31 AM, Rita rmorgan...@gmail.com wrote: I am sure this topic has been visited many times but I though I ask to see if anything changed. We are using hbase with close to 40b rows and backing up the data is non-trivial. We can use export table to another Hadoop/HDFS filesystem but I am not aware of any guaranteed way of preserving data from one version of Hbase to another (specifically if its very old) . Is there a program which will serialize the data into JSON/XML and dump it on a Unix filesystem? Once I get the data we can compress it whatever we like and back it up using our internal software. -- --- Get your facts first, then you can distort them as you please.-- -- --- Get your facts first, then you can distort them as you please.-- -- --- Get your facts first, then you can distort them as you please.--
Re: how client location a region/tablet?
For further information about the catalog tables and region-regionserver assignment, see thisŠ http://hbase.apache.org/book.html#arch.catalog On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote: Thank you Stack, especially for the smart 6 round trip guess for the puzzle. :-) 1. Yeah, we client cache's locations, not the data. -- does it mean for each client, it will cache all location information of a HBase cluster, i.e. which physical server owns which region? Supposing each region has 128M bytes, for a big cluster (P-bytes level), total data size / 128M is not a trivial number, not sure if any overhead to client? 2. A bit confused by what do you mean not the data? For the client cached location information, it should be the data in table METADATA, which is region / physical server mapping data. Why you say not data (do you mean real content in each region)? regards, Lin On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote: On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote: Hello guys, I am referencing the Big Table paper about how a client locates a tablet. In section 5.1 Tablet location, it is mentioned that client will cache all tablet locations, I think it means client will cache root tablet in METADATA table, and all other tablets in METADATA table (which means client cache the whole METADATA table?). My question is, whether HBase implements in the same or similar way? My concern or confusion is, supposing each tablet or region file is 128M bytes, it will be very huge space (i.e. memory footprint) for each client to cache all tablets or region files of METADATA table. Is it doable or feasible in real HBase clusters? Thanks. Yeah, we client cache's locations, not the data. BTW: another confusion from me is in the paper of Big Table section 5.1 Tablet location, it is mentioned that If the client¹s cache is stale, the location algorithm could take up to six round-trips, because stale cache entries are only discovered upon misses (assuming that METADATA tablets do not move very frequently)., I do not know how the 6 times round trip time is calculated, if anyone could answer this puzzle, it will be great. :-) I'm not sure what the 6 is about either. Here is a guesstimate: 1. Go to cached location for a server for a particular user region, but server says that it does not have a region, the client location is stale 2. Go back to client cached meta region that holds user region w/ row we want, but its location is stale. 3. Go to root location, to find new location of meta, but the root location has moved what the client has is stale 4. Find new root location and do lookup of meta region location 5. Go to meta region location to find new user region 6. Go to server w/ user region St.Ack
Re: Choose the location of a record
Blaise, Generally speaking, no. The distribution of row keys over regions is handled by HBase. This is as you would want, so that the failure of any given server is transparent to your application. There are ways to hack around this, but generally you shouldn't design in such a way as to require that. What's the requirement motivating your question? Ian On Aug 23, 2012, at 7:57 AM, Blaise NGONMANG kaledjebla...@yahoo.fr wrote: Hi I just want to know if it is possible to select the server were we want to insert a record. Regards Blaise -- View this message in context: http://old.nabble.com/Choose-the-location-of-a-record-tp34339260p34339260.html Sent from the HBase User mailing list archive at Nabble.com.
Re: client cache for all region server information?
Hi Lin, On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma lin...@gmail.com wrote: Thank you Abhishek, Two more comments, -- Client only caches information as needed for its queries and not necessarily for 'all' region servers. -- how did client know which region server information is necessary to be cached in current HBase implementation? What Abhishek meant here is that it caches only the needed table's rows from META. It also only caches the specific region required for the row you're looking up/operating on, AFAICT. -- When the client loads region server information for the first time? Did client persistent cache information at client side about region server information? The client loads up regionserver information for a table, when it is requested to perform an operation on that table (on a specific row or the whole). It does not immediately, upon initialization, cache the whole of META's contents. Your question makes sense though, that it does seem to be such that a client *may* use quite a bit of memory space in trying to cache the META entries locally, but practically we've not had this cause issues for users yet. The amount of memory cached for META far outweighs the other items it caches (scan results, etc.). At least I have not seen any reports of excessive client memory usage just due to region locations of tables being cached. I think there's more benefits storing/caching it than not doing so, and so far we've not needed the extra complexity of persisting the cache to a local or non-RAM storage than keeping it in memory. -- Harsh J
Re: HTable batch execution order
Hi Shagun, The original ordering index is still maintained. Yes you will have them back in order. Don't be confused by that javadoc statement. The result list is ordered in the same way as the actions list, but the order of which they are executed depends on variable things, and hence the statement The Get may not return what the Put, in the same batch, had put. On Thu, Aug 23, 2012 at 2:49 PM, Shagun Agarwal sha...@yahoo-inc.com wrote: Hi, I have a question about HTable.batch(List? extends Row actions,Object[] results) API, according to java doc -The ordering of execution of the actions is not defined. Meaning if you do a Put and a Get in the same batch call, you will not necessarily be guaranteed that the Get returns what the Put had put. however my question is if I don't mix up the actions only provide Get action, do I get the result in same order in which Get was provided. e.g if I provide 3 Get with row keys [r1, r2, r3], will I get [result1, result2, result3]? Thanks Shagun Agarwal -- Harsh J
Re: client cache for all region server information?
Harsh, thanks for the detailed information. Two more comments, 1. I want to confirm my understanding is correct. At the beginning client cache has nothing, when it issue request for a table, if the region server location is not known, it will request from root META region to get region server information step by step, then cache the region server information. If cache already contain the requested region information, it will use directly from cache. In this way, cache grows when cache miss for requested region information; 2. far outweighs the other items it caches (scan results, etc.), you mean GET API of HBase cache results? Sorry I am not aware of this feature before. How the results are cached, and whether we can control it (supposing a client is doing random read pattern, we do not want to cache information since each read may be unique row-key access)? Appreciate if you could point me to some more detailed information. regards, Lin On Thu, Aug 23, 2012 at 9:35 PM, Harsh J ha...@cloudera.com wrote: Hi Lin, On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma lin...@gmail.com wrote: Thank you Abhishek, Two more comments, -- Client only caches information as needed for its queries and not necessarily for 'all' region servers. -- how did client know which region server information is necessary to be cached in current HBase implementation? What Abhishek meant here is that it caches only the needed table's rows from META. It also only caches the specific region required for the row you're looking up/operating on, AFAICT. -- When the client loads region server information for the first time? Did client persistent cache information at client side about region server information? The client loads up regionserver information for a table, when it is requested to perform an operation on that table (on a specific row or the whole). It does not immediately, upon initialization, cache the whole of META's contents. Your question makes sense though, that it does seem to be such that a client *may* use quite a bit of memory space in trying to cache the META entries locally, but practically we've not had this cause issues for users yet. The amount of memory cached for META far outweighs the other items it caches (scan results, etc.). At least I have not seen any reports of excessive client memory usage just due to region locations of tables being cached. I think there's more benefits storing/caching it than not doing so, and so far we've not needed the extra complexity of persisting the cache to a local or non-RAM storage than keeping it in memory. -- Harsh J
Re: how client location a region/tablet?
Doug, very informative document. Thanks a lot! I read through it and have some thoughts, - Supposing at the beginning, client side cache for region information is empty, and the client wants to GET row-key 123 from table ABC; - The client will read from ROOT table at first. But unfortunately, ROOT table only contains region information for META table (please correct me if I am wrong), but not region information for real data table (e.g. table ABC); - Does the client have to call each META region server one by one, in order to find which META region contains information for region owner of row-key 123 of data table ABC? BTW: I think if there is a way to expose information about what range of table/region each META region contains from .META. region key, it will be better to save time to iterate META region server one by one. Please feel free to correct me if I am wrong. regards, Lin On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil doug.m...@explorysmedical.comwrote: For further information about the catalog tables and region-regionserver assignment, see thisŠ http://hbase.apache.org/book.html#arch.catalog On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote: Thank you Stack, especially for the smart 6 round trip guess for the puzzle. :-) 1. Yeah, we client cache's locations, not the data. -- does it mean for each client, it will cache all location information of a HBase cluster, i.e. which physical server owns which region? Supposing each region has 128M bytes, for a big cluster (P-bytes level), total data size / 128M is not a trivial number, not sure if any overhead to client? 2. A bit confused by what do you mean not the data? For the client cached location information, it should be the data in table METADATA, which is region / physical server mapping data. Why you say not data (do you mean real content in each region)? regards, Lin On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote: On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote: Hello guys, I am referencing the Big Table paper about how a client locates a tablet. In section 5.1 Tablet location, it is mentioned that client will cache all tablet locations, I think it means client will cache root tablet in METADATA table, and all other tablets in METADATA table (which means client cache the whole METADATA table?). My question is, whether HBase implements in the same or similar way? My concern or confusion is, supposing each tablet or region file is 128M bytes, it will be very huge space (i.e. memory footprint) for each client to cache all tablets or region files of METADATA table. Is it doable or feasible in real HBase clusters? Thanks. Yeah, we client cache's locations, not the data. BTW: another confusion from me is in the paper of Big Table section 5.1 Tablet location, it is mentioned that If the client¹s cache is stale, the location algorithm could take up to six round-trips, because stale cache entries are only discovered upon misses (assuming that METADATA tablets do not move very frequently)., I do not know how the 6 times round trip time is calculated, if anyone could answer this puzzle, it will be great. :-) I'm not sure what the 6 is about either. Here is a guesstimate: 1. Go to cached location for a server for a particular user region, but server says that it does not have a region, the client location is stale 2. Go back to client cached meta region that holds user region w/ row we want, but its location is stale. 3. Go to root location, to find new location of meta, but the root location has moved what the client has is stale 4. Find new root location and do lookup of meta region location 5. Go to meta region location to find new user region 6. Go to server w/ user region St.Ack
Re: how client location a region/tablet?
Dong, Some more thoughts, after reading data structure for HRegionInfo = http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html, start key and end key looks informative which we could leverage, - I am not sure if we could leverage this information (stored as part of value in table ROOT) to find which META region may contains region server information for row-key 123 of data table ABC; - But I think unfortunately the information is stored in value of table ROOT, other than key field of table ROOT, so that we have to iterate each row in ROOT table one by one to figure out which META region server to access. Not sure if I get the points. Please feel free to correct me. regards, Lin On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma lin...@gmail.com wrote: Doug, very informative document. Thanks a lot! I read through it and have some thoughts, - Supposing at the beginning, client side cache for region information is empty, and the client wants to GET row-key 123 from table ABC; - The client will read from ROOT table at first. But unfortunately, ROOT table only contains region information for META table (please correct me if I am wrong), but not region information for real data table (e.g. table ABC); - Does the client have to call each META region server one by one, in order to find which META region contains information for region owner of row-key 123 of data table ABC? BTW: I think if there is a way to expose information about what range of table/region each META region contains from .META. region key, it will be better to save time to iterate META region server one by one. Please feel free to correct me if I am wrong. regards, Lin On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil doug.m...@explorysmedical.comwrote: For further information about the catalog tables and region-regionserver assignment, see thisŠ http://hbase.apache.org/book.html#arch.catalog On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote: Thank you Stack, especially for the smart 6 round trip guess for the puzzle. :-) 1. Yeah, we client cache's locations, not the data. -- does it mean for each client, it will cache all location information of a HBase cluster, i.e. which physical server owns which region? Supposing each region has 128M bytes, for a big cluster (P-bytes level), total data size / 128M is not a trivial number, not sure if any overhead to client? 2. A bit confused by what do you mean not the data? For the client cached location information, it should be the data in table METADATA, which is region / physical server mapping data. Why you say not data (do you mean real content in each region)? regards, Lin On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote: On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote: Hello guys, I am referencing the Big Table paper about how a client locates a tablet. In section 5.1 Tablet location, it is mentioned that client will cache all tablet locations, I think it means client will cache root tablet in METADATA table, and all other tablets in METADATA table (which means client cache the whole METADATA table?). My question is, whether HBase implements in the same or similar way? My concern or confusion is, supposing each tablet or region file is 128M bytes, it will be very huge space (i.e. memory footprint) for each client to cache all tablets or region files of METADATA table. Is it doable or feasible in real HBase clusters? Thanks. Yeah, we client cache's locations, not the data. BTW: another confusion from me is in the paper of Big Table section 5.1 Tablet location, it is mentioned that If the client¹s cache is stale, the location algorithm could take up to six round-trips, because stale cache entries are only discovered upon misses (assuming that METADATA tablets do not move very frequently)., I do not know how the 6 times round trip time is calculated, if anyone could answer this puzzle, it will be great. :-) I'm not sure what the 6 is about either. Here is a guesstimate: 1. Go to cached location for a server for a particular user region, but server says that it does not have a region, the client location is stale 2. Go back to client cached meta region that holds user region w/ row we want, but its location is stale. 3. Go to root location, to find new location of meta, but the root location has moved what the client has is stale 4. Find new root location and do lookup of meta region location 5. Go to meta region location to find new user region 6. Go to server w/ user region St.Ack
Re: client cache for all region server information?
Hi Lin, On Thu, Aug 23, 2012 at 7:56 PM, Lin Ma lin...@gmail.com wrote: Harsh, thanks for the detailed information. Two more comments, 1. I want to confirm my understanding is correct. At the beginning client cache has nothing, when it issue request for a table, if the region server location is not known, it will request from root META region to get region server information step by step, then cache the region server information. If cache already contain the requested region information, it will use directly from cache. In this way, cache grows when cache miss for requested region information; You have it correct now. Region locations are cached only if they are not available. And they are cached on need-basis, not all at once. 2. far outweighs the other items it caches (scan results, etc.), you mean GET API of HBase cache results? Sorry I am not aware of this feature before. How the results are cached, and whether we can control it (supposing a client is doing random read pattern, we do not want to cache information since each read may be unique row-key access)? Appreciate if you could point me to some more detailed information. Am speaking of Scanner value caching, not Gets exactly. See more about Scanner (client) caching at http://hbase.apache.org/book.html#perf.hbase.client.caching regards, Lin On Thu, Aug 23, 2012 at 9:35 PM, Harsh J ha...@cloudera.com wrote: Hi Lin, On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma lin...@gmail.com wrote: Thank you Abhishek, Two more comments, -- Client only caches information as needed for its queries and not necessarily for 'all' region servers. -- how did client know which region server information is necessary to be cached in current HBase implementation? What Abhishek meant here is that it caches only the needed table's rows from META. It also only caches the specific region required for the row you're looking up/operating on, AFAICT. -- When the client loads region server information for the first time? Did client persistent cache information at client side about region server information? The client loads up regionserver information for a table, when it is requested to perform an operation on that table (on a specific row or the whole). It does not immediately, upon initialization, cache the whole of META's contents. Your question makes sense though, that it does seem to be such that a client *may* use quite a bit of memory space in trying to cache the META entries locally, but practically we've not had this cause issues for users yet. The amount of memory cached for META far outweighs the other items it caches (scan results, etc.). At least I have not seen any reports of excessive client memory usage just due to region locations of tables being cached. I think there's more benefits storing/caching it than not doing so, and so far we've not needed the extra complexity of persisting the cache to a local or non-RAM storage than keeping it in memory. -- Harsh J -- Harsh J
Re: how client location a region/tablet?
HBase currently keeps a single META region (Doesn't split it). ROOT holds META region location, and META has a few rows in it, a few of them for each table. See also the class MetaScanner. On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma lin...@gmail.com wrote: Dong, Some more thoughts, after reading data structure for HRegionInfo = http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html, start key and end key looks informative which we could leverage, - I am not sure if we could leverage this information (stored as part of value in table ROOT) to find which META region may contains region server information for row-key 123 of data table ABC; - But I think unfortunately the information is stored in value of table ROOT, other than key field of table ROOT, so that we have to iterate each row in ROOT table one by one to figure out which META region server to access. Not sure if I get the points. Please feel free to correct me. regards, Lin On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma lin...@gmail.com wrote: Doug, very informative document. Thanks a lot! I read through it and have some thoughts, - Supposing at the beginning, client side cache for region information is empty, and the client wants to GET row-key 123 from table ABC; - The client will read from ROOT table at first. But unfortunately, ROOT table only contains region information for META table (please correct me if I am wrong), but not region information for real data table (e.g. table ABC); - Does the client have to call each META region server one by one, in order to find which META region contains information for region owner of row-key 123 of data table ABC? BTW: I think if there is a way to expose information about what range of table/region each META region contains from .META. region key, it will be better to save time to iterate META region server one by one. Please feel free to correct me if I am wrong. regards, Lin On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil doug.m...@explorysmedical.comwrote: For further information about the catalog tables and region-regionserver assignment, see thisŠ http://hbase.apache.org/book.html#arch.catalog On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote: Thank you Stack, especially for the smart 6 round trip guess for the puzzle. :-) 1. Yeah, we client cache's locations, not the data. -- does it mean for each client, it will cache all location information of a HBase cluster, i.e. which physical server owns which region? Supposing each region has 128M bytes, for a big cluster (P-bytes level), total data size / 128M is not a trivial number, not sure if any overhead to client? 2. A bit confused by what do you mean not the data? For the client cached location information, it should be the data in table METADATA, which is region / physical server mapping data. Why you say not data (do you mean real content in each region)? regards, Lin On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote: On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote: Hello guys, I am referencing the Big Table paper about how a client locates a tablet. In section 5.1 Tablet location, it is mentioned that client will cache all tablet locations, I think it means client will cache root tablet in METADATA table, and all other tablets in METADATA table (which means client cache the whole METADATA table?). My question is, whether HBase implements in the same or similar way? My concern or confusion is, supposing each tablet or region file is 128M bytes, it will be very huge space (i.e. memory footprint) for each client to cache all tablets or region files of METADATA table. Is it doable or feasible in real HBase clusters? Thanks. Yeah, we client cache's locations, not the data. BTW: another confusion from me is in the paper of Big Table section 5.1 Tablet location, it is mentioned that If the client¹s cache is stale, the location algorithm could take up to six round-trips, because stale cache entries are only discovered upon misses (assuming that METADATA tablets do not move very frequently)., I do not know how the 6 times round trip time is calculated, if anyone could answer this puzzle, it will be great. :-) I'm not sure what the 6 is about either. Here is a guesstimate: 1. Go to cached location for a server for a particular user region, but server says that it does not have a region, the client location is stale 2. Go back to client cached meta region that holds user region w/ row we want, but its location is stale. 3. Go to root location, to find new location of meta, but the root location has moved what the client has is stale 4. Find new root location and do lookup of meta region location 5. Go to meta region location to find new user region 6. Go to server w/ user region St.Ack -- Harsh J
Re: How to avoid stop-the-world GC for HBase Region Server under big heap size
On Wed, Aug 22, 2012 at 11:06 PM, Gen Liu ge...@zynga.com wrote: Hi, We are running Region Server on big memory machine (70G) and set Xmx=64G. Most heap is used as block cache for random read. Stop-the-world GC is killing the region server, but using less heap (16G) doesn't utilize our machines well. Is there a concurrent or parallel GC option that won't block all threads? Any thought is appreciated. Thanks. Have you tried tuning the JVM at all? What are the options that you are running with? You have GC logs enabled? Post a few up on pastebin? As Mohamed asks, you've the slab allocator enabled? What are your configs like? How many regions per server? What size are they? St.Ack
Re: Thrift2 interface
Hey Joe, We have tried a few different things wrt the C++ clients and thrift. Just putting out some of out thoughts here. First, we used the existing Thrift proxy as a separate tier (Thrift proxy tier). The issue there was that we just didn't get enough throughput (for various reasons). Indepedently, adoption of HBase from C++ was increasing - so we thought it made sense to write a native client. So we wrote the native C++ client and embedded the thrift proxy into the region server (embedded thrift proxy). Cutting the redirect from the client was one gain (as the native client is a smart client), but the real advantage came from short-circuiting the flow. In the thrift proxy tier case, the Thrift client would talk to the proxy using Thrift serialization, proxy would deserialize the Thrift call and re-serialize it into the Java client format, then send it to the region server which would deserialize the java formatted buffers again. But in the embedded proxy + native client, we can short-circuit on the embedded proxy and make a function call to the region server which is running in the same JVM (which helps cut one round of serialization and deserialization). The issues, however, with the thrift based approach are that the Java objects (Htable, scan, get, put, etc) are not thrift definitions, so they need to be updated as a separate (and often very different) set of api's every time there is an enhancement to the Java side of things. The proxy tier has to be separately configured/tuned/bug fixed from the region server to make sure it is as performant as the region server - as the overall system will perform like the slowest component in the stack. The ideal solution (IMHO) is to have a C++ client which has a compatible protocol with the Java client, so that there are no significant perf differences between the two approaches, and there is no separate proxy to tune. Just a though of course, might be hard to achieve. Of course we have just talked about this :) but with the move to protocol buffers in trunk, this should be easier. Out of curiosity, why thrift2 - do you specifically need thrift api's to region servers? Why not efficient C/C++ client for HBase? Thanks Karthik On 8/22/12 4:06 PM, Joe Pallas joseph.pal...@oracle.com wrote: On Aug 21, 2012, at 9:29 AM, Stack wrote: On Mon, Aug 20, 2012 at 6:18 PM, Joe Pallas joseph.pal...@oracle.com wrote: Anyone out there actively using the thrift2 interface in 0.94? Thrift bindings for C++ don¹t seem to handle optional arguments too well (that is to say, it seems that optional arguments are not optional). Unfortunately, checkAndPut uses an optional argument for value to distinguish between the two cases (value must match vs no cell with that column qualifier). Any clues on how to work around that difficulty would be welcome. If you make a patch, we'll commit it Joe. Well, I think the patch really needs to be in Thrift; the only workaround I can see is to restructure the hbase.thrift interface file to avoid having routines with optional arguments. It seems a shame to break compatibility with existing clients for that, and I am not sure if there is a way to do it without breaking compatibility. (On the other hand, we¹re talking about thrift2, so it isn¹t like there are many existing clients.) The state of Thrift documentation is lamentable. The original white paper is the most detailed information I can find about compatibility rules. It has enough information to tell me that Thrift doesn¹t support overloading of routine names within a service, because the names are the identifiers used to identify the routines. I think that means it isn¹t possible to make a compatible change that would only affect the client side. Have you seen this? https://github.com/facebook/native-cpp-hbase-client Would it help? The native client stuff is certainly interesting, but, as near as I can tell, it expects the in-region-server Thrift server, which I would like to give a chance to mature a bit before playing with. I¹m also puzzled by the hbase.thrift file in that repository. It seems to be based on the older HBase Thrift interface, but it adds some functions. I can¹t see how a client could use them, though, since there are no HBase-side patches. Anyone involved with FB¹s native client efforts care to enlighten me? joe
Re: how client location a region/tablet?
Lin, On Thu, Aug 23, 2012 at 10:10 PM, Lin Ma lin...@gmail.com wrote: Thanks, Harsh! - HBase currently keeps a single META region (Doesn't split it). -- does it mean there is only one row in ROOT table, which points the only one META region? Yes, currently this is the case. We disabled multiple META regions at some point, I am unsure about why exactly but perhaps it was complex to maintain that. - In Big Table, it seems they have multiple META regions (tablets), is it an advantage over HBase? :-) Well, depends. A single META region hasn't proven as a scalability bottleneck to anyone yet. A single META region can easily serve millions of rows if needed, like any other region, and I've usually not seen META table grow so big in deployments. -- Harsh J
RE: how client location a region/tablet?
I too thought there are multiple meta regions where as just one ROOT. May be I am mixing b/w Big Table and Hbase. Thanks, Abhishek -Original Message- From: Lin Ma [mailto:lin...@gmail.com] Sent: Thursday, August 23, 2012 9:41 AM To: user@hbase.apache.org; ha...@cloudera.com Cc: doug.m...@explorysmedical.com Subject: Re: how client location a region/tablet? Thanks, Harsh! - HBase currently keeps a single META region (Doesn't split it). -- does it mean there is only one row in ROOT table, which points the only one META region? - In Big Table, it seems they have multiple META regions (tablets), is it an advantage over HBase? :-) regards, Lin On Thu, Aug 23, 2012 at 11:48 PM, Harsh J ha...@cloudera.com wrote: HBase currently keeps a single META region (Doesn't split it). ROOT holds META region location, and META has a few rows in it, a few of them for each table. See also the class MetaScanner. On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma lin...@gmail.com wrote: Dong, Some more thoughts, after reading data structure for HRegionInfo = http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo. html , start key and end key looks informative which we could leverage, - I am not sure if we could leverage this information (stored as part of value in table ROOT) to find which META region may contains region server information for row-key 123 of data table ABC; - But I think unfortunately the information is stored in value of table ROOT, other than key field of table ROOT, so that we have to iterate each row in ROOT table one by one to figure out which META region server to access. Not sure if I get the points. Please feel free to correct me. regards, Lin On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma lin...@gmail.com wrote: Doug, very informative document. Thanks a lot! I read through it and have some thoughts, - Supposing at the beginning, client side cache for region information is empty, and the client wants to GET row-key 123 from table ABC; - The client will read from ROOT table at first. But unfortunately, ROOT table only contains region information for META table (please correct me if I am wrong), but not region information for real data table (e.g. table ABC); - Does the client have to call each META region server one by one, in order to find which META region contains information for region owner of row-key 123 of data table ABC? BTW: I think if there is a way to expose information about what range of table/region each META region contains from .META. region key, it will be better to save time to iterate META region server one by one. Please feel free to correct me if I am wrong. regards, Lin On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil doug.m...@explorysmedical.comwrote: For further information about the catalog tables and region-regionserver assignment, see thisŠ http://hbase.apache.org/book.html#arch.catalog On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote: Thank you Stack, especially for the smart 6 round trip guess for the puzzle. :-) 1. Yeah, we client cache's locations, not the data. -- does it mean for each client, it will cache all location information of a HBase cluster, i.e. which physical server owns which region? Supposing each region has 128M bytes, for a big cluster (P-bytes level), total data size / 128M is not a trivial number, not sure if any overhead to client? 2. A bit confused by what do you mean not the data? For the client cached location information, it should be the data in table METADATA, which is region / physical server mapping data. Why you say not data (do you mean real content in each region)? regards, Lin On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote: On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote: Hello guys, I am referencing the Big Table paper about how a client locates a tablet. In section 5.1 Tablet location, it is mentioned that client will cache all tablet locations, I think it means client will cache root tablet in METADATA table, and all other tablets in METADATA table (which means client cache the whole METADATA table?). My question is, whether HBase implements in the same or similar way? My concern or confusion is, supposing each tablet or region file is 128M bytes, it will be very huge space (i.e. memory footprint) for each client to cache all tablets or region files of METADATA table. Is it doable or feasible in real HBase clusters? Thanks. Yeah, we client cache's locations, not the data. BTW: another confusion from me is in the paper of Big Table section 5.1 Tablet location, it is mentioned that If the client¹s cache is stale, the location
Client receives SocketTimeoutException (CallerDisconnected on RS)
Hi there, While I'm performing read-intensive benchmarks, I'm seeing storm of CallerDisconnectedException in certain RegionServers. As the documentation says, my client received a SocketTimeoutException (6ms etc...) at the same time. It's always happening and I get very poor read-performances (from 10 to 5000 reads/sc) in a 10 nodes cluster. My benchmark consists in several iterations launching 10, 100 and 1000 Get requests on a given random rowkey with a single CF/qualifier. I'm using HBase 0.94.1 (a few commits before the official stable release) with Hadoop 1.0.3. Bloom filters have been enabled (at the rowkey level). I do not find very clear informations about these exceptions. From the reference guide : (...) you should consider digging in a bit more if you aren't doing something to trigger them. Well... could you help me digging? :-) -- AM.
Re: Client receives SocketTimeoutException (CallerDisconnected on RS)
Hi Adrien, I would love to see the region server side of the logs while those socket timeouts happen, also check the GC log, but one thing people often hit while doing pure random read workloads with tons of clients is running out of sockets because they are all stuck in CLOSE_WAIT. You can check that by using lsof. There are other discussion on this mailing list about it. J-D On Thu, Aug 23, 2012 at 10:24 AM, Adrien Mogenet adrien.moge...@gmail.com wrote: Hi there, While I'm performing read-intensive benchmarks, I'm seeing storm of CallerDisconnectedException in certain RegionServers. As the documentation says, my client received a SocketTimeoutException (6ms etc...) at the same time. It's always happening and I get very poor read-performances (from 10 to 5000 reads/sc) in a 10 nodes cluster. My benchmark consists in several iterations launching 10, 100 and 1000 Get requests on a given random rowkey with a single CF/qualifier. I'm using HBase 0.94.1 (a few commits before the official stable release) with Hadoop 1.0.3. Bloom filters have been enabled (at the rowkey level). I do not find very clear informations about these exceptions. From the reference guide : (...) you should consider digging in a bit more if you aren't doing something to trigger them. Well... could you help me digging? :-) -- AM.
Re: Client receives SocketTimeoutException (CallerDisconnected on RS)
Hi Adrien, As well, if you can share the client code (number of threads, regions, is it a set of single get, or are they multi gets, this kind of stuff). Cheers, N. On Thu, Aug 23, 2012 at 7:40 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Hi Adrien, I would love to see the region server side of the logs while those socket timeouts happen, also check the GC log, but one thing people often hit while doing pure random read workloads with tons of clients is running out of sockets because they are all stuck in CLOSE_WAIT. You can check that by using lsof. There are other discussion on this mailing list about it. J-D On Thu, Aug 23, 2012 at 10:24 AM, Adrien Mogenet adrien.moge...@gmail.com wrote: Hi there, While I'm performing read-intensive benchmarks, I'm seeing storm of CallerDisconnectedException in certain RegionServers. As the documentation says, my client received a SocketTimeoutException (6ms etc...) at the same time. It's always happening and I get very poor read-performances (from 10 to 5000 reads/sc) in a 10 nodes cluster. My benchmark consists in several iterations launching 10, 100 and 1000 Get requests on a given random rowkey with a single CF/qualifier. I'm using HBase 0.94.1 (a few commits before the official stable release) with Hadoop 1.0.3. Bloom filters have been enabled (at the rowkey level). I do not find very clear informations about these exceptions. From the reference guide : (...) you should consider digging in a bit more if you aren't doing something to trigger them. Well... could you help me digging? :-) -- AM.
Re: Client receives SocketTimeoutException (CallerDisconnected on RS)
Hi guys, 1/ I checked quickly the GC logs and saw nothing. Since I need very fast lookup I set the zookeeper.session.timeout parameter to 10s to consider the RS as dead after very short pauses, and that did not occur. 2/ I did not check but I don't think I ran out of sockets since the ulimit has been set very high, but I'll check ! 3/ Benchmark can launch several R/W threads, but even the simplest program leads to my issue : Configuration config = HBaseConfiguration.create(); HTable table = new HTable(config, test); for (1, 10, 100 or 1000) getsList.add(new Get(randomKey) table.get(getsList) table.close() 4/ I will share more logs tomorrow to dig deeper, I personally need a long STW-pause :-) Cheers, On Thu, Aug 23, 2012 at 7:49 PM, N Keywal nkey...@gmail.com wrote: Hi Adrien, As well, if you can share the client code (number of threads, regions, is it a set of single get, or are they multi gets, this kind of stuff). Cheers, N. On Thu, Aug 23, 2012 at 7:40 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: Hi Adrien, I would love to see the region server side of the logs while those socket timeouts happen, also check the GC log, but one thing people often hit while doing pure random read workloads with tons of clients is running out of sockets because they are all stuck in CLOSE_WAIT. You can check that by using lsof. There are other discussion on this mailing list about it. J-D On Thu, Aug 23, 2012 at 10:24 AM, Adrien Mogenet adrien.moge...@gmail.com wrote: Hi there, While I'm performing read-intensive benchmarks, I'm seeing storm of CallerDisconnectedException in certain RegionServers. As the documentation says, my client received a SocketTimeoutException (6ms etc...) at the same time. It's always happening and I get very poor read-performances (from 10 to 5000 reads/sc) in a 10 nodes cluster. My benchmark consists in several iterations launching 10, 100 and 1000 Get requests on a given random rowkey with a single CF/qualifier. I'm using HBase 0.94.1 (a few commits before the official stable release) with Hadoop 1.0.3. Bloom filters have been enabled (at the rowkey level). I do not find very clear informations about these exceptions. From the reference guide : (...) you should consider digging in a bit more if you aren't doing something to trigger them. Well... could you help me digging? :-) -- AM
Re: HBase row level cache for random read
On 8/18/12 12:33 PM, Stack st...@duboce.net wrote: On Fri, Aug 17, 2012 at 4:42 PM, Gen Liu ge...@zynga.com wrote: I assume block cache store compressed data, Generally its not, not unless you use block encoding. Can you be more specific on this? Are you talking about https://issues.apache.org/jira/browse/HBASE-4218 So this is only available in 0.94 then? Thanks. one block can hold 6 rows, but in random read, maybe 1 row is ever accessed, 5/6 of the cache space is wasted. Is there a better way of caching for random read. Lower the block size to 32k or even 16k might be a choice. We don't seem to list this as an option in this section, http://hbase.apache.org/book.html#perf.reading, but yes, if lots of random reads, smaller block cache could make a difference. St.Ack
Re: HBase row level cache for random read
On Thu, Aug 23, 2012 at 12:06 PM, Gen Liu ge...@zynga.com wrote: On 8/18/12 12:33 PM, Stack st...@duboce.net wrote: On Fri, Aug 17, 2012 at 4:42 PM, Gen Liu ge...@zynga.com wrote: I assume block cache store compressed data, Generally its not, not unless you use block encoding. Can you be more specific on this? Are you talking about https://issues.apache.org/jira/browse/HBASE-4218 So this is only available in 0.94 then? Thanks. one block can hold 6 rows, but in random read, maybe 1 row is ever accessed, 5/6 of the cache space is wasted. Is there a better way of caching for random read. Lower the block size to 32k or even 16k might be a choice. We don't seem to list this as an option in this section, http://hbase.apache.org/book.html#perf.reading, but yes, if lots of random reads, smaller block cache could make a difference. See release note in https://issues.apache.org/jira/browse/HBASE-4218 St.Ack
limit on number of blocks per HFile and files per region
Hi I have a few questions on blocks/file and file/region. 1. Can there be multiple row keys per block and then per HFile? Or is a block or Hfile dedicated to a single row key? I have a scenario, where for the same column family, some rowkeys will have very wide rows, say rowkey W, and some rowkeys will have very narrow rows, say rowkey N. In my case, puts for rowkeys W and N are interleaved with a ratio of say 90 rowkeyW puts vs 10 rowkeyN puts. On the get side, my app works on getting data for a single rowkey at a time. Will that mean for a rowkeyN, the entries will be scattered across regions on that same region server, given there are interleaved puts? Or Is there a way I can enforce contiguous writes to a region/Hfile reserved for rowkey N. This way, I can leverage the block cache and have the entire/most of rowkeyN fit in there for that session. 2. Is there a limit on number of HFiles that can exist per region? Basically, on what criteria does a rowkey data gets split in two regions [on the same region server]. I am assuming there can be many regions per region server. And multiple regions for the same table can belong in the same region server. 3. Also, is there a limit on the number of blocks that are created per HFile? What determines whether a split is required? Thanks, Abhishek
Re: limit on number of blocks per HFile and files per region
Inline. In general I'd recommend you read the documentation more closely and/or get the book. J-D On Thu, Aug 23, 2012 at 4:21 PM, Pamecha, Abhishek apame...@x.com wrote: 1. Can there be multiple row keys per block and then per HFile? Or is a block or Hfile dedicated to a single row key? Multiple row keys per HFile block. Read http://hbase.apache.org/book.html#hfilev2 I have a scenario, where for the same column family, some rowkeys will have very wide rows, say rowkey W, and some rowkeys will have very narrow rows, say rowkey N. In my case, puts for rowkeys W and N are interleaved with a ratio of say 90 rowkeyW puts vs 10 rowkeyN puts. On the get side, my app works on getting data for a single rowkey at a time. Will that mean for a rowkeyN, the entries will be scattered across regions on that same region server, given there are interleaved puts? Or Is there a way I can enforce contiguous writes to a region/Hfile reserved for rowkey N. This way, I can leverage the block cache and have the entire/most of rowkeyN fit in there for that session. The row keys are sorted according to their lexicographical order. See http://hbase.apache.org/book.html#row If you don't want the big rows coexisting with the small rows, put them in different column families or different tables. 2. Is there a limit on number of HFiles that can exist per region? I think your understanding of HFiles being a bit wrong prompted you to ask this, my previous answers probably make it so that you don't need this answer anymore, but there it is just in case: The HFiles are compacted when reaching hbase.hstore.compactionThreshold (default of 3) per family, and you can have no more than hbase.hstore.blockingStoreFiles (default of 7). Basically, on what criteria does a rowkey data gets split in two regions [on the same region server]. I am assuming there can be many regions per region server. And multiple regions for the same table can belong in the same region server. A row key only lives in a single region since the regions are split based on row keys. 3. Also, is there a limit on the number of blocks that are created per HFile? No. What determines whether a split is required? hbase.hregion.max.filesize, also see http://hbase.apache.org/book.html#disable.splitting if you want to change that.
Re: how client location a region/tablet?
Thank you Harsh. You answered my question. I like the current architecture of HBase, which is designed for extensibility for the future -- we have two layer index of data structure, and we can utilize it when needed for specific problems. It looks like you buy a 4 bed-room house, but only utilizing one room for living before having more children. :-) regards, Lin On Fri, Aug 24, 2012 at 12:46 AM, Harsh J ha...@cloudera.com wrote: Lin, On Thu, Aug 23, 2012 at 10:10 PM, Lin Ma lin...@gmail.com wrote: Thanks, Harsh! - HBase currently keeps a single META region (Doesn't split it). -- does it mean there is only one row in ROOT table, which points the only one META region? Yes, currently this is the case. We disabled multiple META regions at some point, I am unsure about why exactly but perhaps it was complex to maintain that. - In Big Table, it seems they have multiple META regions (tablets), is it an advantage over HBase? :-) Well, depends. A single META region hasn't proven as a scalability bottleneck to anyone yet. A single META region can easily serve millions of rows if needed, like any other region, and I've usually not seen META table grow so big in deployments. -- Harsh J
Re: how client location a region/tablet?
Me too, Abhishek -- you are not alone. But it is good to learn and discuss here to know various design choices. regards, Lin On Fri, Aug 24, 2012 at 1:06 AM, Pamecha, Abhishek apame...@x.com wrote: I too thought there are multiple meta regions where as just one ROOT. May be I am mixing b/w Big Table and Hbase. Thanks, Abhishek -Original Message- From: Lin Ma [mailto:lin...@gmail.com] Sent: Thursday, August 23, 2012 9:41 AM To: user@hbase.apache.org; ha...@cloudera.com Cc: doug.m...@explorysmedical.com Subject: Re: how client location a region/tablet? Thanks, Harsh! - HBase currently keeps a single META region (Doesn't split it). -- does it mean there is only one row in ROOT table, which points the only one META region? - In Big Table, it seems they have multiple META regions (tablets), is it an advantage over HBase? :-) regards, Lin On Thu, Aug 23, 2012 at 11:48 PM, Harsh J ha...@cloudera.com wrote: HBase currently keeps a single META region (Doesn't split it). ROOT holds META region location, and META has a few rows in it, a few of them for each table. See also the class MetaScanner. On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma lin...@gmail.com wrote: Dong, Some more thoughts, after reading data structure for HRegionInfo = http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo. html , start key and end key looks informative which we could leverage, - I am not sure if we could leverage this information (stored as part of value in table ROOT) to find which META region may contains region server information for row-key 123 of data table ABC; - But I think unfortunately the information is stored in value of table ROOT, other than key field of table ROOT, so that we have to iterate each row in ROOT table one by one to figure out which META region server to access. Not sure if I get the points. Please feel free to correct me. regards, Lin On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma lin...@gmail.com wrote: Doug, very informative document. Thanks a lot! I read through it and have some thoughts, - Supposing at the beginning, client side cache for region information is empty, and the client wants to GET row-key 123 from table ABC; - The client will read from ROOT table at first. But unfortunately, ROOT table only contains region information for META table (please correct me if I am wrong), but not region information for real data table (e.g. table ABC); - Does the client have to call each META region server one by one, in order to find which META region contains information for region owner of row-key 123 of data table ABC? BTW: I think if there is a way to expose information about what range of table/region each META region contains from .META. region key, it will be better to save time to iterate META region server one by one. Please feel free to correct me if I am wrong. regards, Lin On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil doug.m...@explorysmedical.comwrote: For further information about the catalog tables and region-regionserver assignment, see thisŠ http://hbase.apache.org/book.html#arch.catalog On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote: Thank you Stack, especially for the smart 6 round trip guess for the puzzle. :-) 1. Yeah, we client cache's locations, not the data. -- does it mean for each client, it will cache all location information of a HBase cluster, i.e. which physical server owns which region? Supposing each region has 128M bytes, for a big cluster (P-bytes level), total data size / 128M is not a trivial number, not sure if any overhead to client? 2. A bit confused by what do you mean not the data? For the client cached location information, it should be the data in table METADATA, which is region / physical server mapping data. Why you say not data (do you mean real content in each region)? regards, Lin On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote: On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote: Hello guys, I am referencing the Big Table paper about how a client locates a tablet. In section 5.1 Tablet location, it is mentioned that client will cache all tablet locations, I think it means client will cache root tablet in METADATA table, and all other tablets in METADATA table (which means client cache the whole METADATA table?). My question is, whether HBase implements in the same or similar way? My concern or confusion is, supposing each tablet or region file is 128M bytes, it will be very huge space (i.e. memory footprint) for each client to cache all tablets or region files of METADATA
HBase/JRuby update wiki page
The wiki page at http://wiki.apache.org/hadoop/Hbase/JRuby is out of date. I have updated the code so that it works with the late model APIs here: https://github.com/rjurney/enron-jruby-sinatra-hbase-pig/blob/master/hbase_example.rb Can someone please give me edit access on the HBase wiki, so I can fix/update the documentation? Thanks! -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
hbase many-to-many design
Hi 'user', This is a many-to-many question, I also infer the hbase design FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ_Design. What I want to do is desinging a 'user' table, incluing 'user' basic infomations(columnFamily1), and team-name 'user' joined in(columnFamily2), When a user join in a new Team, I want to update the 'user' table to add a column to 'columnFamily2'. So when getting the user , I get all the team-names the user join in. Yet I don't want to put duplicate records, known as multi-versions, each user has only one record. What should I do? Any advice will be appreciated! Thanks Best Regards Mike
Re: hbase many-to-many design
If you are you adding a new column to the team column family, I dont think multi version comes into picture. Multi Versioning is saving copies of values of a particular cell, but you are creating a new cell within the same row. Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Aug 24, 2012 at 8:07 AM, jing wang happygodwithw...@gmail.comwrote: Hi 'user', This is a many-to-many question, I also infer the hbase design FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ_Design. What I want to do is desinging a 'user' table, incluing 'user' basic infomations(columnFamily1), and team-name 'user' joined in(columnFamily2), When a user join in a new Team, I want to update the 'user' table to add a column to 'columnFamily2'. So when getting the user , I get all the team-names the user join in. Yet I don't want to put duplicate records, known as multi-versions, each user has only one record. What should I do? Any advice will be appreciated! Thanks Best Regards Mike
Re: hbase many-to-many design
Hi Sonal, Thanks for your reply. How to add a new column to the existing columnFamily?The method I want to try is using 3 steps, first get the record, construct a new put, using the reocrd's( getted before) columnFamily2, then delete the old record in Hbase, finally put the new constructed 'put' into Hbase.I really don't think this is a good way. if another 'put', including a new column is put to Hbase, this is a 'update' action or another version? Would you please give me some reference for adding a column to a row? Thanks Best Regards Mike 2012/8/24 Sonal Goyal sonalgoy...@gmail.com If you are you adding a new column to the team column family, I dont think multi version comes into picture. Multi Versioning is saving copies of values of a particular cell, but you are creating a new cell within the same row. Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Aug 24, 2012 at 8:07 AM, jing wang happygodwithw...@gmail.com wrote: Hi 'user', This is a many-to-many question, I also infer the hbase design FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ_Design. What I want to do is desinging a 'user' table, incluing 'user' basic infomations(columnFamily1), and team-name 'user' joined in(columnFamily2), When a user join in a new Team, I want to update the 'user' table to add a column to 'columnFamily2'. So when getting the user , I get all the team-names the user join in. Yet I don't want to put duplicate records, known as multi-versions, each user has only one record. What should I do? Any advice will be appreciated! Thanks Best Regards Mike
Re: hbase many-to-many design
Sorry, is this what you want? I created a table with two column families. I added one row and column family cf1 with qualifier team1. Then I added new team cf1 with qualifier team2. hbase(main):001:0 create 'multi','cf1','cf2' 0 row(s) in 1.6240 seconds hbase(main):002:0 put 'multi','row1','cf1:team1','firstTeam' 0 row(s) in 0.0880 seconds hbase(main):003:0 scan 'multi' ROW COLUMN+CELL row1 column=cf1:team1, timestamp=1345783824219, value=firstTeam 1 row(s) in 0.0540 seconds hbase(main):004:0 put 'multi','row1','cf1:team2','secondTeam' 0 row(s) in 0.0060 seconds hbase(main):005:0 scan 'multi' ROW COLUMN+CELL row1 column=cf1:team1, timestamp=1345783824219, value=firstTeam row1 column=cf1:team2, timestamp=1345783846821, value=secondTea m 1 row(s) in 0.0250 seconds Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Aug 24, 2012 at 10:03 AM, jing wang happygodwithw...@gmail.comwrote: Hi Sonal, Thanks for your reply. How to add a new column to the existing columnFamily?The method I want to try is using 3 steps, first get the record, construct a new put, using the reocrd's( getted before) columnFamily2, then delete the old record in Hbase, finally put the new constructed 'put' into Hbase.I really don't think this is a good way. if another 'put', including a new column is put to Hbase, this is a 'update' action or another version? Would you please give me some reference for adding a column to a row? Thanks Best Regards Mike 2012/8/24 Sonal Goyal sonalgoy...@gmail.com If you are you adding a new column to the team column family, I dont think multi version comes into picture. Multi Versioning is saving copies of values of a particular cell, but you are creating a new cell within the same row. Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Aug 24, 2012 at 8:07 AM, jing wang happygodwithw...@gmail.com wrote: Hi 'user', This is a many-to-many question, I also infer the hbase design FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ_Design. What I want to do is desinging a 'user' table, incluing 'user' basic infomations(columnFamily1), and team-name 'user' joined in(columnFamily2), When a user join in a new Team, I want to update the 'user' table to add a column to 'columnFamily2'. So when getting the user , I get all the team-names the user join in. Yet I don't want to put duplicate records, known as multi-versions, each user has only one record. What should I do? Any advice will be appreciated! Thanks Best Regards Mike
Re: hbase many-to-many design
Hi Jong You can add a new column unannounced. This means your current 'put' does not have to recall which other columns are already present in the row or for that matter, in the table. You just issue a put command as if it was your first one, and the column will be added. Unlike rdbms, There are no update or alter table commands You need to execute to add a new column. If the column you are adding already existed,then a new version of value you put, is stored. Thanks Abhishek i Sent from my iPad with iMstakes On Aug 23, 2012, at 21:34, jing wang happygodwithw...@gmail.com wrote: Hi Sonal, Thanks for your reply. How to add a new column to the existing columnFamily?The method I want to try is using 3 steps, first get the record, construct a new put, using the reocrd's( getted before) columnFamily2, then delete the old record in Hbase, finally put the new constructed 'put' into Hbase.I really don't think this is a good way. if another 'put', including a new column is put to Hbase, this is a 'update' action or another version? Would you please give me some reference for adding a column to a row? Thanks Best Regards Mike 2012/8/24 Sonal Goyal sonalgoy...@gmail.com If you are you adding a new column to the team column family, I dont think multi version comes into picture. Multi Versioning is saving copies of values of a particular cell, but you are creating a new cell within the same row. Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Aug 24, 2012 at 8:07 AM, jing wang happygodwithw...@gmail.com wrote: Hi 'user', This is a many-to-many question, I also infer the hbase design FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ_Design. What I want to do is desinging a 'user' table, incluing 'user' basic infomations(columnFamily1), and team-name 'user' joined in(columnFamily2), When a user join in a new Team, I want to update the 'user' table to add a column to 'columnFamily2'. So when getting the user , I get all the team-names the user join in. Yet I don't want to put duplicate records, known as multi-versions, each user has only one record. What should I do? Any advice will be appreciated! Thanks Best Regards Mike
Re: limit on number of blocks per HFile and files per region
Thanks Jean-daniel. I did go through the documentation, but there was no clear answer to interleaving puts from two or more row keys or if there was a way to reserve contiguous blocks per rowkey. I made some derivations but clearly, I was incorrect in some of them as you pointed out too. The questions were partly validations and partly doubt-riddance. :) Thanks Abhishek i Sent from my iPad with iMstakes On Aug 23, 2012, at 17:19, Jean-Daniel Cryans jdcry...@apache.org wrote: Inline. In general I'd recommend you read the documentation more closely and/or get the book. J-D On Thu, Aug 23, 2012 at 4:21 PM, Pamecha, Abhishek apame...@x.com wrote: 1. Can there be multiple row keys per block and then per HFile? Or is a block or Hfile dedicated to a single row key? Multiple row keys per HFile block. Read http://hbase.apache.org/book.html#hfilev2 I have a scenario, where for the same column family, some rowkeys will have very wide rows, say rowkey W, and some rowkeys will have very narrow rows, say rowkey N. In my case, puts for rowkeys W and N are interleaved with a ratio of say 90 rowkeyW puts vs 10 rowkeyN puts. On the get side, my app works on getting data for a single rowkey at a time. Will that mean for a rowkeyN, the entries will be scattered across regions on that same region server, given there are interleaved puts? Or Is there a way I can enforce contiguous writes to a region/Hfile reserved for rowkey N. This way, I can leverage the block cache and have the entire/most of rowkeyN fit in there for that session. The row keys are sorted according to their lexicographical order. See http://hbase.apache.org/book.html#row If you don't want the big rows coexisting with the small rows, put them in different column families or different tables. 2. Is there a limit on number of HFiles that can exist per region? I think your understanding of HFiles being a bit wrong prompted you to ask this, my previous answers probably make it so that you don't need this answer anymore, but there it is just in case: The HFiles are compacted when reaching hbase.hstore.compactionThreshold (default of 3) per family, and you can have no more than hbase.hstore.blockingStoreFiles (default of 7). Basically, on what criteria does a rowkey data gets split in two regions [on the same region server]. I am assuming there can be many regions per region server. And multiple regions for the same table can belong in the same region server. A row key only lives in a single region since the regions are split based on row keys. 3. Also, is there a limit on the number of blocks that are created per HFile? No. What determines whether a split is required? hbase.hregion.max.filesize, also see http://hbase.apache.org/book.html#disable.splitting if you want to change that.
Re: hbase many-to-many design
Hi Abhishek, Got it. Thank you very much. Thanks Best Regards Mike 2012/8/24 Pamecha, Abhishek apame...@x.com Hi Jong You can add a new column unannounced. This means your current 'put' does not have to recall which other columns are already present in the row or for that matter, in the table. You just issue a put command as if it was your first one, and the column will be added. Unlike rdbms, There are no update or alter table commands You need to execute to add a new column. If the column you are adding already existed,then a new version of value you put, is stored. Thanks Abhishek i Sent from my iPad with iMstakes On Aug 23, 2012, at 21:34, jing wang happygodwithw...@gmail.com wrote: Hi Sonal, Thanks for your reply. How to add a new column to the existing columnFamily?The method I want to try is using 3 steps, first get the record, construct a new put, using the reocrd's( getted before) columnFamily2, then delete the old record in Hbase, finally put the new constructed 'put' into Hbase.I really don't think this is a good way. if another 'put', including a new column is put to Hbase, this is a 'update' action or another version? Would you please give me some reference for adding a column to a row? Thanks Best Regards Mike 2012/8/24 Sonal Goyal sonalgoy...@gmail.com If you are you adding a new column to the team column family, I dont think multi version comes into picture. Multi Versioning is saving copies of values of a particular cell, but you are creating a new cell within the same row. Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Aug 24, 2012 at 8:07 AM, jing wang happygodwithw...@gmail.com wrote: Hi 'user', This is a many-to-many question, I also infer the hbase design FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ_Design. What I want to do is desinging a 'user' table, incluing 'user' basic infomations(columnFamily1), and team-name 'user' joined in(columnFamily2), When a user join in a new Team, I want to update the 'user' table to add a column to 'columnFamily2'. So when getting the user , I get all the team-names the user join in. Yet I don't want to put duplicate records, known as multi-versions, each user has only one record. What should I do? Any advice will be appreciated! Thanks Best Regards Mike
Re: hbase many-to-many design
Hi Sonal, Thanks again.I have got a misunderstanding of Column-oriented Hbase. Said by Abhishek, solving my problem *You can add a new column unannounced. This means your current 'put' does not have to recall which other columns are already present in the row or for that matter, in the table. You just issue a put command as if it was your first one, and the column will be added. Unlike rdbms, There are no update or alter table commands You need to execute to add a new column.* * If the column you are adding already existed,then a new version of value you put, is stored.* Thanks, Mike 2012/8/24 Sonal Goyal sonalgoy...@gmail.com Sorry, is this what you want? I created a table with two column families. I added one row and column family cf1 with qualifier team1. Then I added new team cf1 with qualifier team2. hbase(main):001:0 create 'multi','cf1','cf2' 0 row(s) in 1.6240 seconds hbase(main):002:0 put 'multi','row1','cf1:team1','firstTeam' 0 row(s) in 0.0880 seconds hbase(main):003:0 scan 'multi' ROW COLUMN+CELL row1 column=cf1:team1, timestamp=1345783824219, value=firstTeam 1 row(s) in 0.0540 seconds hbase(main):004:0 put 'multi','row1','cf1:team2','secondTeam' 0 row(s) in 0.0060 seconds hbase(main):005:0 scan 'multi' ROW COLUMN+CELL row1 column=cf1:team1, timestamp=1345783824219, value=firstTeam row1 column=cf1:team2, timestamp=1345783846821, value=secondTea m 1 row(s) in 0.0250 seconds Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Aug 24, 2012 at 10:03 AM, jing wang happygodwithw...@gmail.com wrote: Hi Sonal, Thanks for your reply. How to add a new column to the existing columnFamily?The method I want to try is using 3 steps, first get the record, construct a new put, using the reocrd's( getted before) columnFamily2, then delete the old record in Hbase, finally put the new constructed 'put' into Hbase.I really don't think this is a good way. if another 'put', including a new column is put to Hbase, this is a 'update' action or another version? Would you please give me some reference for adding a column to a row? Thanks Best Regards Mike 2012/8/24 Sonal Goyal sonalgoy...@gmail.com If you are you adding a new column to the team column family, I dont think multi version comes into picture. Multi Versioning is saving copies of values of a particular cell, but you are creating a new cell within the same row. Best Regards, Sonal Crux: Reporting for HBase https://github.com/sonalgoyal/crux Nube Technologies http://www.nubetech.co http://in.linkedin.com/in/sonalgoyal On Fri, Aug 24, 2012 at 8:07 AM, jing wang happygodwithw...@gmail.com wrote: Hi 'user', This is a many-to-many question, I also infer the hbase design FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ_Design. What I want to do is desinging a 'user' table, incluing 'user' basic infomations(columnFamily1), and team-name 'user' joined in(columnFamily2), When a user join in a new Team, I want to update the 'user' table to add a column to 'columnFamily2'. So when getting the user , I get all the team-names the user join in. Yet I don't want to put duplicate records, known as multi-versions, each user has only one record. What should I do? Any advice will be appreciated! Thanks Best Regards Mike