Potential bugs in HTable In incrementColumnValue method
Hi HBase community, Can anyone confirm that the method incrementColumnValue is implemented correctly? I'm talking about mainly the deprecated method: @Deprecated @Override public long incrementColumnValue(final byte [] row, final byte [] family, final byte [] qualifier, final long amount, final boolean writeToWAL) throws IOException { return incrementColumnValue(row, family, qualifier, amount, writeToWAL? Durability.SKIP_WAL: Durability.USE_DEFAULT); } Note from the above, if writeToWAL is true, Durability is set to SKIP_WAL. It does not make sense to me so I'm asking if this might be a potential bug. Best Regards, Jerry
Re: Potential bugs in HTable In incrementColumnValue method
Hi Vlad, I copied the code from HBase version 1.0.0. I first noticed it in version 0.98.6. We have codes that use HBase since 0.92. So some of the codes have not been ported to the latest version therefore they are still using the deprecated methods. The reason I'm asking is because I don't know if I should use SKIP_WAL to get the same semantic of writeToWAL (true). I'm doubting it because the name SKIP_WAL implies writeToWAL false. :) Best Regards, Jerry On Tue, Jun 9, 2015 at 12:03 PM, Ted Yu yuzhih...@gmail.com wrote: I see code in this formation in 0.98 branch. Looking at the unit tests which exercise incrementColumnValue(), they all call: public long incrementColumnValue(final byte [] row, final byte [] family, final byte [] qualifier, final long amount) Possibly because the one mentioned by Jerry is deprecated. FYI On Tue, Jun 9, 2015 at 8:49 AM, Vladimir Rodionov vladrodio...@gmail.com wrote: Hi, Jerry Which version of HBase is it? -Vlad On Tue, Jun 9, 2015 at 8:05 AM, Jerry Lam chiling...@gmail.com wrote: Hi HBase community, Can anyone confirm that the method incrementColumnValue is implemented correctly? I'm talking about mainly the deprecated method: @Deprecated @Override public long incrementColumnValue(final byte [] row, final byte [] family, final byte [] qualifier, final long amount, final boolean writeToWAL) throws IOException { return incrementColumnValue(row, family, qualifier, amount, writeToWAL? Durability.SKIP_WAL: Durability.USE_DEFAULT); } Note from the above, if writeToWAL is true, Durability is set to SKIP_WAL. It does not make sense to me so I'm asking if this might be a potential bug. Best Regards, Jerry
Re: Potential bugs in HTable In incrementColumnValue method
Done. Thanks everyone for confirming this! HBASE-13881 https://issues.apache.org/jira/browse/HBASE-13881 On Tue, Jun 9, 2015 at 7:09 PM, Ted Yu yuzhih...@gmail.com wrote: Did a quick search in HBase JIRA - no hit. Jerry: Mind logging one ? Thanks On Tue, Jun 9, 2015 at 3:30 PM, Andrew Purtell apurt...@apache.org wrote: Is there a JIRA for this? On Tue, Jun 9, 2015 at 11:15 AM, Ted Yu yuzhih...@gmail.com wrote: Seems a bug to me w.r.t. interpretation of writeToWAL Cheers On Tue, Jun 9, 2015 at 10:50 AM, Jerry Lam chiling...@gmail.com wrote: Hi Vlad, I copied the code from HBase version 1.0.0. I first noticed it in version 0.98.6. We have codes that use HBase since 0.92. So some of the codes have not been ported to the latest version therefore they are still using the deprecated methods. The reason I'm asking is because I don't know if I should use SKIP_WAL to get the same semantic of writeToWAL (true). I'm doubting it because the name SKIP_WAL implies writeToWAL false. :) Best Regards, Jerry On Tue, Jun 9, 2015 at 12:03 PM, Ted Yu yuzhih...@gmail.com wrote: I see code in this formation in 0.98 branch. Looking at the unit tests which exercise incrementColumnValue(), they all call: public long incrementColumnValue(final byte [] row, final byte [] family, final byte [] qualifier, final long amount) Possibly because the one mentioned by Jerry is deprecated. FYI On Tue, Jun 9, 2015 at 8:49 AM, Vladimir Rodionov vladrodio...@gmail.com wrote: Hi, Jerry Which version of HBase is it? -Vlad On Tue, Jun 9, 2015 at 8:05 AM, Jerry Lam chiling...@gmail.com wrote: Hi HBase community, Can anyone confirm that the method incrementColumnValue is implemented correctly? I'm talking about mainly the deprecated method: @Deprecated @Override public long incrementColumnValue(final byte [] row, final byte [] family, final byte [] qualifier, final long amount, final boolean writeToWAL) throws IOException { return incrementColumnValue(row, family, qualifier, amount, writeToWAL? Durability.SKIP_WAL: Durability.USE_DEFAULT); } Note from the above, if writeToWAL is true, Durability is set to SKIP_WAL. It does not make sense to me so I'm asking if this might be a potential bug. Best Regards, Jerry -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
How to change MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles?
Hi HBase users, I wonder if anyone knows how to make change to the MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles? The default value is 32 which is quite small. HBase Version 0.98 Thank you, Jerry
Re: How to change MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles?
Hi Matteo, Thank you for the info. I tried it but it doesn't seem to take any effect. Apparently the code in the LoadIncremtnalHFiles does not take anything other than variables from hbase-site.xml which is unfortunate. We have more than 32 hfiles to bulkload. So this is really not working... Best Regards, Jerry On Wed, Aug 20, 2014 at 10:49 AM, Matteo Bertozzi theo.berto...@gmail.com wrote: you should be able to use the -D option to set the new value LoadIncrementalHFiles -Dhbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily=NEW_VALUE Matteo On Wed, Aug 20, 2014 at 3:46 PM, Jerry Lam chiling...@gmail.com wrote: Hi HBase users, I wonder if anyone knows how to make change to the MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles? The default value is 32 which is quite small. HBase Version 0.98 Thank you, Jerry
Re: How to change MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles?
Hi Matteo, Thank you for addressing the issue. For now, I will just set the variable in hbase-site.xml. Best Regards, Jerry On Wed, Aug 20, 2014 at 12:33 PM, Matteo Bertozzi theo.berto...@gmail.com wrote: yeah sorry, just looked at the code and it is not initializing the tool correctly to pickup the -D configuration. let me fix that, I've opened HBASE-11789 as you said with the current code only the hbase-site.xml conf is used, so you need to set the property there. Matteo On Wed, Aug 20, 2014 at 5:24 PM, Jerry Lam chiling...@gmail.com wrote: Hi Matteo, Thank you for the info. I tried it but it doesn't seem to take any effect. Apparently the code in the LoadIncremtnalHFiles does not take anything other than variables from hbase-site.xml which is unfortunate. We have more than 32 hfiles to bulkload. So this is really not working... Best Regards, Jerry On Wed, Aug 20, 2014 at 10:49 AM, Matteo Bertozzi theo.berto...@gmail.com wrote: you should be able to use the -D option to set the new value LoadIncrementalHFiles -Dhbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily=NEW_VALUE Matteo On Wed, Aug 20, 2014 at 3:46 PM, Jerry Lam chiling...@gmail.com wrote: Hi HBase users, I wonder if anyone knows how to make change to the MAX_FILES_PER_REGION_PER_FAMILY in LoadIncrementalHFiles? The default value is 32 which is quite small. HBase Version 0.98 Thank you, Jerry
Re: Performance between HBaseClient scan and HFileReaderV2
Hi Tom, Good point. Note that I also ran the HBaseClient performance test several times (as you can see from the chart). The caching should also benefit the second time I ran the HBaseClient performance test not just benefitting the HFileReaderV2 test. I still don't understand what makes the HBaseClient performs so poorly in comparison to access directly HDFS. I can understand maybe a factor of 2 (even that it is too much) but a factor of 8 is quite unreasonable. Any hint? Jerry On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood tom.w.h...@gmail.com wrote: I'm also new to HBase and am not familiar with HFileReaderV2. However, in your description, you didn't mention anything about clearing the linux OS cache between tests. That might be why you're seeing the big difference if you ran the HBaseClient test first, it may have warmed the OS cache and then HFileReaderV2 benefited from it. Just a guess... -- Tom On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam chiling...@gmail.com wrote: Hello HBase users, I just ran a very simple performance test and would like to see if what I experienced make sense. The experiment is as follows: - I filled a hbase region with 700MB data (each row has roughly 45 columns and the size is 20KB for the entire row) - I configured the region to hold 4GB (therefore no split occurs) - I ran compactions after the data is loaded and make sure that there is only 1 region in the table under test. - No other table exists in the hbase cluster because this is a DEV environment - I'm using HBase 0.92.1 The test is very basic. I use HBaseClient to scan the entire region to retrieve all rows and all columns in the table, just iterating all KeyValue pairs until it is done. It took about 1 minute 22 sec to complete. (Note that I disable block cache and uses caching size about 1). I ran another test using HFileReaderV2 and scan the entire region to retrieve all rows and all columns, just iterating all keyValue pairs until it is done. It took 11 sec. The performance difference is dramatic (almost 8 times faster using HFileReaderV2). I want to know why the difference is so big or I didn't configure HBase properly. From this experiment, HDFS can deliver the data efficiently so it is not the bottleneck. Any help is appreciated! Jerry
Re: Performance between HBaseClient scan and HFileReaderV2
Hello St.Ack, I would like to switch to 0.94 but we are using 0.92.1 and we will not change until the end of 2014. I can change the client of HBase (e.g. AsyncHBase) if this is the bottleneck. If the problem is server side (e.g. regionserver), are there anything I can do to improve the performance? Best Regards, Jerry On Thu, Jan 2, 2014 at 11:23 AM, Stack st...@duboce.net wrote: On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam chiling...@gmail.com wrote: Hello HBase users, I just ran a very simple performance test and would like to see if what I experienced make sense. The experiment is as follows: - I filled a hbase region with 700MB data (each row has roughly 45 columns and the size is 20KB for the entire row) - I configured the region to hold 4GB (therefore no split occurs) - I ran compactions after the data is loaded and make sure that there is only 1 region in the table under test. - No other table exists in the hbase cluster because this is a DEV environment - I'm using HBase 0.92.1 Can you use a 0.94? It has had some scanner improvements. Thanks, St.Ack The test is very basic. I use HBaseClient to scan the entire region to retrieve all rows and all columns in the table, just iterating all KeyValue pairs until it is done. It took about 1 minute 22 sec to complete. (Note that I disable block cache and uses caching size about 1). I ran another test using HFileReaderV2 and scan the entire region to retrieve all rows and all columns, just iterating all keyValue pairs until it is done. It took 11 sec. The performance difference is dramatic (almost 8 times faster using HFileReaderV2). I want to know why the difference is so big or I didn't configure HBase properly. From this experiment, HDFS can deliver the data efficiently so it is not the bottleneck. Any help is appreciated! Jerry
Re: Performance between HBaseClient scan and HFileReaderV2
Hello Vladimir, In my use case, I guarantee that a major compaction is executed before any scan happens because the system we build is a read only system. There will have no deleted cells. Additionally, I only need to read from a single column family and therefore I don't need to access multiple HFiles. Filter conditions are nice to have because if I can read HFile 8x faster than using HBaseClient, I can do the filter on the client side and still perform faster than using HBaseClient. Thank you for your input! Jerry On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov vrodio...@carrieriq.comwrote: HBase scanner MUST guarantee correct order of KeyValues (coming from different HFile's), filter condition+ filter condition on included column families and qualifiers, time range, max versions and correctly process deleted cells. Direct HFileReader does nothing from the above list. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Jerry Lam [chiling...@gmail.com] Sent: Thursday, January 02, 2014 7:56 AM To: user Subject: Re: Performance between HBaseClient scan and HFileReaderV2 Hi Tom, Good point. Note that I also ran the HBaseClient performance test several times (as you can see from the chart). The caching should also benefit the second time I ran the HBaseClient performance test not just benefitting the HFileReaderV2 test. I still don't understand what makes the HBaseClient performs so poorly in comparison to access directly HDFS. I can understand maybe a factor of 2 (even that it is too much) but a factor of 8 is quite unreasonable. Any hint? Jerry On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood tom.w.h...@gmail.com wrote: I'm also new to HBase and am not familiar with HFileReaderV2. However, in your description, you didn't mention anything about clearing the linux OS cache between tests. That might be why you're seeing the big difference if you ran the HBaseClient test first, it may have warmed the OS cache and then HFileReaderV2 benefited from it. Just a guess... -- Tom On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam chiling...@gmail.com wrote: Hello HBase users, I just ran a very simple performance test and would like to see if what I experienced make sense. The experiment is as follows: - I filled a hbase region with 700MB data (each row has roughly 45 columns and the size is 20KB for the entire row) - I configured the region to hold 4GB (therefore no split occurs) - I ran compactions after the data is loaded and make sure that there is only 1 region in the table under test. - No other table exists in the hbase cluster because this is a DEV environment - I'm using HBase 0.92.1 The test is very basic. I use HBaseClient to scan the entire region to retrieve all rows and all columns in the table, just iterating all KeyValue pairs until it is done. It took about 1 minute 22 sec to complete. (Note that I disable block cache and uses caching size about 1). I ran another test using HFileReaderV2 and scan the entire region to retrieve all rows and all columns, just iterating all keyValue pairs until it is done. It took 11 sec. The performance difference is dramatic (almost 8 times faster using HFileReaderV2). I want to know why the difference is so big or I didn't configure HBase properly. From this experiment, HDFS can deliver the data efficiently so it is not the bottleneck. Any help is appreciated! Jerry Confidentiality Notice: The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or notificati...@carrieriq.com and delete or destroy any copy of this message and its attachments.
Re: Performance between HBaseClient scan and HFileReaderV2
Hello Sergey and Enis, Thank you for the pointer! HBASE-8691 will definitely help. HBASE-10076 (Very interesting/exciting feature by the way!) is what I need. How can I port it to 0.92.x if it is at all possible? I understand that my test is not realistic however since I have only 1 region with 1 HFile (this is by design), so there should not have any merge sorted read going on. One thing I'm not sure is that since I use snappy compression, does the value of the KeyValue is decompress at the region server? If yes, I think it is quite inefficient because the decompression can be done at the client side. Saving bandwidth saves a lot of time for the type of workload I'm working on. Best Regards, Jerry On Thu, Jan 2, 2014 at 5:02 PM, Enis Söztutar e...@apache.org wrote: Nice test! There is a couple of things here: (1) HFileReader reads only one file, versus, an HRegion reads multiple files (into the KeyValueHeap) to do a merge scan. So, although there is only one file, there is some overehead of doing a merge sort'ed read from multiple files in the region. For a more realistic test, you can try to do the reads using HRegion directly (instead of HFileReader). The overhead is not that much though in my tests. (2) For scanning with client API, the results have to be serialized and deserialized and send over the network (or loopback for local). This is another overhead that is not there in HfileReader. (3) HBase scanner RPC implementation is NOT streaming. The RPC works like fetching batch size (1) records, and cannot fully saturate the disk and network pipeline. In my tests for MapReduce over snapshot files (HBASE-8369), I have measured 5x difference, because of layers (2) and (3). Please see my slides at http://www.slideshare.net/enissoz/mapreduce-over-snapshots I think we can do a much better job at (3), see HBASE-8691. However, there will always be some overhead, although it should not be 5-8x. As suggested above, in the meantime, you can take a look at the patch for HBASE-8369, and https://issues.apache.org/jira/browse/HBASE-10076 to see whether it suits your use case. Enis On Thu, Jan 2, 2014 at 1:43 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Er, using MR over snapshots, which reads files directly... https://issues.apache.org/jira/browse/HBASE-8369 However, it was only committed to 98. There was interest in 94 port (HBASE-10076), but it never happened... On Thu, Jan 2, 2014 at 1:42 PM, Sergey Shelukhin ser...@hortonworks.com wrote: You might be interested in using https://issues.apache.org/jira/browse/HBASE-8369 However, it was only committed to 98. There was interest in 94 port (HBASE-10076), but it never happened... On Thu, Jan 2, 2014 at 1:32 PM, Jerry Lam chiling...@gmail.com wrote: Hello Vladimir, In my use case, I guarantee that a major compaction is executed before any scan happens because the system we build is a read only system. There will have no deleted cells. Additionally, I only need to read from a single column family and therefore I don't need to access multiple HFiles. Filter conditions are nice to have because if I can read HFile 8x faster than using HBaseClient, I can do the filter on the client side and still perform faster than using HBaseClient. Thank you for your input! Jerry On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov vrodio...@carrieriq.comwrote: HBase scanner MUST guarantee correct order of KeyValues (coming from different HFile's), filter condition+ filter condition on included column families and qualifiers, time range, max versions and correctly process deleted cells. Direct HFileReader does nothing from the above list. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Jerry Lam [chiling...@gmail.com] Sent: Thursday, January 02, 2014 7:56 AM To: user Subject: Re: Performance between HBaseClient scan and HFileReaderV2 Hi Tom, Good point. Note that I also ran the HBaseClient performance test several times (as you can see from the chart). The caching should also benefit the second time I ran the HBaseClient performance test not just benefitting the HFileReaderV2 test. I still don't understand what makes the HBaseClient performs so poorly in comparison to access directly HDFS. I can understand maybe a factor of 2 (even that it is too much) but a factor of 8 is quite unreasonable. Any hint? Jerry On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood tom.w.h...@gmail.com wrote: I'm also new to HBase and am not familiar with HFileReaderV2. However, in your description, you didn't mention anything about clearing the linux
Re: Performance between HBaseClient scan and HFileReaderV2
Hello Lars, Yes, I used setCaching for getting more KeyValues in each RPC call. Also yes, when I used HFileReaderV2 I still reading from HDFS. Short circuiting is enabled but I don't know how to ensure it has been used (Is there log that can tell me if it has been used?). I did made sure the HBaseClient runs on the same regionserver that holds the data. I just tried asynchbase (as I'm running out of ideas, I started to try everything), it takes 60 seconds to scan through the data (20 seconds less than using HBaseClient). Best Regards, Jerry On Thu, Jan 2, 2014 at 4:44 PM, lars hofhansl la...@apache.org wrote: From the below I gather you set scanner caching (Scan.setCaching(...))? When you use HFileReaderV2, you're still reading from HDFS, right? Are you using short circuit reading (avoiding network IO)? In the HBaseClient client you pipe all the data through the network again. Is the HBaseClient located on a different machine? I would use a profiler (just use jVisualVM, which ships with the JDK and use the sampling profiler) to see where the time is spent. Lastly, to echo what other folks have said, 0.92 is pretty old at this point and I personally added a lot of performance improvements to HBase during the 0.94 timeframe and other's have as well. If you could test the same with 0.94, I'd be very interested in the numbers. -- Lars From: Jerry Lam chiling...@gmail.com To: user user@hbase.apache.org Sent: Thursday, January 2, 2014 1:32 PM Subject: Re: Performance between HBaseClient scan and HFileReaderV2 Hello Vladimir, In my use case, I guarantee that a major compaction is executed before any scan happens because the system we build is a read only system. There will have no deleted cells. Additionally, I only need to read from a single column family and therefore I don't need to access multiple HFiles. Filter conditions are nice to have because if I can read HFile 8x faster than using HBaseClient, I can do the filter on the client side and still perform faster than using HBaseClient. Thank you for your input! Jerry On Thu, Jan 2, 2014 at 1:30 PM, Vladimir Rodionov vrodio...@carrieriq.comwrote: HBase scanner MUST guarantee correct order of KeyValues (coming from different HFile's), filter condition+ filter condition on included column families and qualifiers, time range, max versions and correctly process deleted cells. Direct HFileReader does nothing from the above list. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com From: Jerry Lam [chiling...@gmail.com] Sent: Thursday, January 02, 2014 7:56 AM To: user Subject: Re: Performance between HBaseClient scan and HFileReaderV2 Hi Tom, Good point. Note that I also ran the HBaseClient performance test several times (as you can see from the chart). The caching should also benefit the second time I ran the HBaseClient performance test not just benefitting the HFileReaderV2 test. I still don't understand what makes the HBaseClient performs so poorly in comparison to access directly HDFS. I can understand maybe a factor of 2 (even that it is too much) but a factor of 8 is quite unreasonable. Any hint? Jerry On Sun, Dec 29, 2013 at 9:09 PM, Tom Hood tom.w.h...@gmail.com wrote: I'm also new to HBase and am not familiar with HFileReaderV2. However, in your description, you didn't mention anything about clearing the linux OS cache between tests. That might be why you're seeing the big difference if you ran the HBaseClient test first, it may have warmed the OS cache and then HFileReaderV2 benefited from it. Just a guess... -- Tom On Mon, Dec 23, 2013 at 12:18 PM, Jerry Lam chiling...@gmail.com wrote: Hello HBase users, I just ran a very simple performance test and would like to see if what I experienced make sense. The experiment is as follows: - I filled a hbase region with 700MB data (each row has roughly 45 columns and the size is 20KB for the entire row) - I configured the region to hold 4GB (therefore no split occurs) - I ran compactions after the data is loaded and make sure that there is only 1 region in the table under test. - No other table exists in the hbase cluster because this is a DEV environment - I'm using HBase 0.92.1 The test is very basic. I use HBaseClient to scan the entire region to retrieve all rows and all columns in the table, just iterating all KeyValue pairs until it is done. It took about 1 minute 22 sec to complete. (Note that I disable block cache and uses caching size about 1). I ran another test using HFileReaderV2 and scan the entire region to retrieve all rows and all columns, just iterating all keyValue pairs
Performance between HBaseClient scan and HFileReaderV2
Hello HBase users, I just ran a very simple performance test and would like to see if what I experienced make sense. The experiment is as follows: - I filled a hbase region with 700MB data (each row has roughly 45 columns and the size is 20KB for the entire row) - I configured the region to hold 4GB (therefore no split occurs) - I ran compactions after the data is loaded and make sure that there is only 1 region in the table under test. - No other table exists in the hbase cluster because this is a DEV environment - I'm using HBase 0.92.1 The test is very basic. I use HBaseClient to scan the entire region to retrieve all rows and all columns in the table, just iterating all KeyValue pairs until it is done. It took about 1 minute 22 sec to complete. (Note that I disable block cache and uses caching size about 1). I ran another test using HFileReaderV2 and scan the entire region to retrieve all rows and all columns, just iterating all keyValue pairs until it is done. It took 11 sec. The performance difference is dramatic (almost 8 times faster using HFileReaderV2). I want to know why the difference is so big or I didn't configure HBase properly. From this experiment, HDFS can deliver the data efficiently so it is not the bottleneck. Any help is appreciated! Jerry
Re: Nosqls schema design
Hi Nick: Your question is a good and tough one. I haven't find anything that helps in guiding the schema design in the nosql world. There are general concepts but none of them is closed to the SQL schema design in which you can apply some rules to guiding your decision. The best presentation I have found about the general concepts in hbase schema design is http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-hbasecon-2012.html and search for Schema Design. From this presentation, you can learn why it is so difficult to come up with a suggestion for your problem and learn some best practices to start your own design. HTH, Jerry On Thu, Nov 8, 2012 at 10:17 AM, Nick maillard nicolas.maill...@fifty-five.com wrote: Thanks for the anwsers. I'm trying to really make sense of NoSql and Hbase in particular. The software part has a lot of loop wholes and I'm still fighting off the compaction storm issue, so right I would not say hbase is fast when it comes to writing. But my post was more nosql schema thoughts, after so long on SQL schemas it does take a little time to stop thinking that way in terms of schema but also of in terms of questions or of interaction if you'd rather. So contrary to SQL I cannot think a logical model for data and figure out later what I'll want out of it. In my case I stated 10 TB but this is very likely to grow since it is the starting scenario. I do believe having a 30 minutes latency before ingesting logs is not an issue, however the questions to the Hbase must be anwsered in real time manner. I have been trying to play with my questions and see how they can fit in a rowkey and Or columnfamilies but they being different in nature and purpose I ended supposing they would end up in a number of different hbase tables in order to adress the scope of questions. One table for one or three questions. The questions have joins and filter embedded in them. My post was about getting your insight on how you would go about answering this type of issues, what your schemas might be. Overall how to switch from SQL vision to noSQL vision. Coprocessor to create a couple of tables on the fly for all questions are an interesting way. To mapreduce the logs however I am afraid the performance would be to slow. I was thinking of answering in milliseconds if possible. But this might be me being new and not evaluating correctly.
Re: How to check if a major_compact is done?
Hi Yun: Please refer to HBase Metric: http://hbase.apache.org/book/hbase_metrics.html The hbase.regionserver.compactionQueueSize seems promising but I'm not certain because I have never use it. Best Regards, Jerry On Thu, Nov 8, 2012 at 6:43 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Please someone correct me if I'm wrong, but I think there is some information exposed to JMX which give you the duration (and size) of the last compaction. JM 2012/11/8, PG pengyunm...@gmail.com: Hi, thanks for the comments. One thing is shouldn't web UI comes from the hbase API, or can I issue function call to get the progress of compaction?. Hun On Nov 8, 2012, at 1:33 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: There is no interface which says that the major compaction is completed. But you can see that major compaction is in progress from the web UI. Sorry if am wrong here. Regards Ram On Thu, Nov 8, 2012 at 11:38 AM, yun peng pengyunm...@gmail.com wrote: Hi, All, I want to measure the duration of a major compaction in HBase. Since the function call majorCompact is asynchronous, I may need to manually check when the major compaction is done. Does Hbase (as of version 0.92.4) provide an interface to determine completion of major compaction? Thanks. Yun
Re: Is HBaseAdmin thread-safe?
Hi HBase users: I look at the code, hbaseadmin depends on HConnection. Is HConnectionImplementation thread-safe? Best Regards, Jerry On Wed, Nov 7, 2012 at 6:21 PM, Jerry Lam chiling...@gmail.com wrote: Hi HBase users: Is HBaseAdmin thread-safe? Best Regards, Jerry
Re: Best technique for doing lookup with Secondary Index
Can we enforce 2 regions to collocate together as a logical group? On Fri, Oct 26, 2012 at 6:14 AM, fding hbase fding.hb...@gmail.com wrote: https://github.com/danix800/hbase-indexed On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: AFAIK, RPC cannot be avoided even if Region A and Region B are on same RS since these two regions are from different table. Am i right? No... suppose your Region A and Region B of different tables are collocated on same RS then from the coprocessor environment variable you can get access to the RS. From RS you can get the online regions and from that region object you can call puts or gets. This will not involve any RPC with in that RS because we only deal with Region objects. Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 26, 2012 12:17 PM To: user@hbase.apache.org Subject: Re: Best technique for doing lookup with Secondary Index Now your main question is lookups right Now there are some more hooks in the scan flow called pre/postScannerOpen, pre/postScannerNext. May be you can try using them to do a look up on the secondary table and then use those values and pass it to the main table next(). In secondary index its hard to avoid at-least two RPC calls(1 from client to table B and then from table B to Table A) whether you use coproc or not. But, i believe using coproc is better than doing RPC calls from client since it might be outside the subnet/network of cluster. In this case, the RPC will be faster when we use coprocs. In my case the client is certainly not in the same subnet or network zone. I need to provide results of query in around 100 milliseconds or less so i need to be really frugal. Let me know your views on this. Have you implemented queries with Secondary indexes using coproc yet? At present i have tried the client side query and i can get the results of query in around 100 ms. I am enticed to try out the coproc implementation. But this may involve more RPC calls as your regions of A and B may be in different RS. AFAIK, RPC cannot be avoided even if Region A and Region B are on same RS since these two regions are from different table. Am i right? Thanks, Anil Gupta On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan ramkrishna.vasude...@huawei.com wrote: Is it a good idea to create Htable instance on B and do put in my mapper? I might try this idea. Yes you can do this.. May be the same mapper you can do a put for table B. This was how we have tried loading data to another table by using the main table A Puts. Now your main question is lookups right Now there are some more hooks in the scan flow called pre/postScannerOpen, pre/postScannerNext. May be you can try using them to do a look up on the secondary table and then use those values and pass it to the main table next(). But this may involve more RPC calls as your regions of A and B may be in different RS. If something is wrong in my understanding of what you said, kindly spare me. :) Regards Ram -Original Message- From: anil gupta [mailto:anilgupt...@gmail.com] Sent: Friday, October 26, 2012 3:40 AM To: user@hbase.apache.org Subject: Re: Best technique for doing lookup with Secondary Index Anoop: In prePut hook u call HTable#put()? Anil: Yes i call HTable#put() in prePut. Is there better way of doing it? Anoop: Why use the network calls from server side here then? Anil: I thought this is a cleaner approach since i am using BulkLoader. I decided not to run two jobs since i am generating a UniqueIdentifier at runtime in bulkloader. Anoop: can not handle it from client alone? Anil: I cannot handle it from client since i am using BulkLoader. Is it a good idea to create Htable instance on B and do put in my mapper? I might try this idea. Anoop: You can have a look at Lily project. Anil: It's little late for us to evaluate Lily now and at present we dont need complex secondary index since our data is immutable. Ram: what is rowkey B here? Anil: Suppose i am storing customer events in table A. I have two requirement for data query: 1. Query customer events on basis of customer_Id and event_ID. 2. Query customer events on basis of event_timestamp and customer_ID. 70% of querying is done by query#1, so i will create customer_Idevent_ID as row key of Table A. Now, in order to support fast results for query#2, i need to create a secondary index on A. I
Re: Coprocessor end point vs MapReduce?
Hi JM: There was a thread discussing M/R bulk delete vs. Coprocessor bulk delete. The thread subject is Bulk Delete. The guy in that post suggested to write a HFile which contains all the delete markers and then use bulk incremental load facility to actually move all the delete markers to the regions at once. This strategy works for my use case too because my M/R job generates a lot of version delete markers. You might take a look on that thread for additional ways to delete data from hbase. Best Regards, Jerry On Thu, Oct 25, 2012 at 1:13 PM, Anoop John anoop.hb...@gmail.com wrote: What I still don’t understand is, since both CP and MR are both running on the region side, with is the MR better than the CP? For the case bulk delete alone CP (Endpoint) will be better than MR for sure.. Considering your over all need people were suggesting better MR.. U need a scan and move some data into another table too... Both MR and CP run on the region side ??? - Well there is difference. The CP run within your RS process itself.. So that is why bulk delete using Endpoint is efficient.. It is a local read and delete. No n/w calls involved at all.. But in case of MR even if the mappers run on the same machine as that of the region it is a inter process communication.. Hope I explained you the diff well... -Anoop- On Thu, Oct 25, 2012 at 6:31 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi all, First, sorry about my slowness to reply to this thread, but it went to my spam folder and I lost sight of it. I don’t have good knowledge of RDBMS, and so I don’t have good knowledge of triggers too. That’s why I looked at the endpoints too because they are pretty new for me. First, I can’t really use multiple tables. I have one process writing to this table barely real-time. Another one is deleting from this table too. But some rows are never deleted. They are timing out, and need to be moved by the process I’m building here. I was not aware of the possibility to setup the priority for an MR job (any link to show how?). That’s something I will dig into. I was a bit scared about the network load if I’m doing deletes lines by lines and not bulk. What I still don’t understand is, since both CP and MR are both running on the region side, with is the MR better than the CP? Because the hadoop framework is taking care of it and will guarantee that it will run on all the regions? Also, is there some sort of “pre” and “post” methods I can override for MR jobs to initially list of puts/deletes and submit them at the end? Or should I do that one by one on the map method? Thanks, JM 2012/10/18, lohit lohit.vijayar...@gmail.com: I might be little off here. If rows are moved to another table on weekly or daily basis, why not create per weekly or per day table. That way you need to copy and delete. Of course it will not work you are are selectively filtering between timestamps and clients have to have notion of multiple tables. 2012/10/18 Anoop Sam John anoo...@huawei.com A CP and Endpoints operates at a region level.. Any operation within one region we can perform using this.. I have seen in below use case that along with the delete there was a need for inserting data to some other table also.. Also this was kind of a periodic action.. I really doubt how the endpoints alone can be used here.. I also tend towards the MR.. The idea behind the bulk delete CP is simple. We have a use case of deleting a bulk of rows and this need to be online delete. I also have seen in the mailing list many people ask question regarding that... In all people were using scans and get the rowkeys to the client side and then doing the deletes.. Yes most of the time complaint was the slowness.. One bulk delete performance improvement was done in HBASE-6284.. Still thought we can do all the operation (scan+delete) in server side and we can make use of the endpoints here.. This will be much more faster and can be used for online bulk deletes.. -Anoop- From: Michael Segel [michael_se...@hotmail.com] Sent: Thursday, October 18, 2012 11:31 PM To: user@hbase.apache.org Subject: Re: Coprocessor end point vs MapReduce? Doug, One thing that concerns me is that a lot of folks are gravitating to Coprocessors and may be using them for the wrong thing. Has anyone done any sort of research as to some of the limitations and negative impacts on using coprocessors? While I haven't really toyed with the idea of bulk deletes, periodic deletes is probably not a good use of coprocessors however using them to synchronize tables would be a valid use case. Thx -Mike On Oct 18, 2012, at 7:36 AM, Doug Meil doug.m...@explorysmedical.com wrote: To echo what
Question on Scanner REST API Usage
Hi HBase community: I have a few questions on the usage of Scanner via REST API: - From the XML schema ( http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/package-summary.html#xmlschema), we can set the maximum number of values to return for each call to next() by specifying the batch attribute. Is there a way to set the number of rows for caching that will be passed to scanners (setCaching)? - Also, I cannot find a way to get all columns of a single row for each call to next(). Can someone tell me if this is possible? Note that setting the batch size won't work because, for example, some rows might have 10 columns and the other rows might have 5 columns, setting batch to 10 will include cells that are from other rows. I want a API that behaves like the Java native API that will get all columns of a row when I call next(). Any help is greatly appreciated. Best Regards, Jerry
Re: Using filters in REST/stargate returns 204 (No content)
Hi Suresh: Have you tried to create a scanner without the filter? Does it return errors as well? Best Regards, Jerry On Fri, Oct 19, 2012 at 1:16 PM, Kumar, Suresh suresh.kum...@emc.comwrote: Here is the hbase shell command which works, I am not able to get these results using curl/stargate. scan 'apachelogs', { COLUMNS = 'mylog:pcol', FILTER = SingleColumnValueFilter('mylog','pcol', =, 'regexstring: ERROR x.') } Here is the curl command which does not work: curl -v -H Content-Type:text/xml -d @args.txt http://localhost:8080/apachelogs/scanner where args.txt: Scanner filter { latestVersion:true, ifMissing:true, qualifier:cGNvbAo=, family:a2V5c3RvbmVsb2cK, op:EQUAL, type:SingleColumnValueFilter, comparator:{value:RVJST1Igc2VydmljZSBhdXRoZW50aWNhdGUgVXNlcgo=,ty pe:RegexStringComparator} } /filter /Scanner Thanks, Suresh -Original Message- From: Andrew Purtell [mailto:apurt...@apache.org] Sent: Thursday, October 18, 2012 1:19 PM To: user@hbase.apache.org Subject: Re: Using filters in REST/stargate returns 204 (No content) What does the HBase shell return if you try that scan programatically? On Thu, Oct 18, 2012 at 11:02 AM, Kumar, Suresh suresh.kum...@emc.comwrote: I have a HBase Java client which has a couple of filters and just work fine, I get the expected result. Here is the code: HTable table = new HTable(conf, apachelogs); Scan scan = new Scan(); FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL); RegexStringComparator comp = new RegexStringComparator(ERROR x.); SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes(mylog), Bytes.toBytes(pcol), CompareOp.EQUAL, comp); filter.setFilterIfMissing(true); list.addFilter(filter); scan.setFilter(list); ResultScanner scanner = table.getScanner(scan); I startup the REST server, and use curl for the above functionality, I just base 64 encoded ERROR x.: curl -v -H Content-Type:text/xml -d @args.txt http://localhost:8080/apachelogs/scanner where args.txt is: Scanner filter { latestVersion:true, ifMissing:true, qualifier:pcol, family:mylog, op:EQUAL, type:SingleColumnValueFilter, comparator:{value:RVJST1Igc2VydmljZSBhdXRoZW50aWNhdGUgVXNlcgo=,ty pe:RegexStringComparator} } /filter /Scanner which returns * About to connect() to localhost port 8080 (#0) * Trying 127.0.0.1... connected POST /apachelogs/scanner HTTP/1.1 User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 Host: localhost:8080 Accept: */* Content-Type:text/xml Content-Length: 318 * upload completely sent off: 318out of 318 bytes HTTP/1.1 201 Created Location: http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6 Content-Length: 0 * Connection #0 to host localhost left intact * Closing connection #0 but curl -v http://localhost:8080/apachelogs/scanner/13505819795654de4e6c6 returns HTTP/1.1 204 No Content Any clues? Thanks, Suresh -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Slow scanning for PrefixFilter on EncodedBlocks
Hi ./zahoor: I don't think it is the same issue. Did you provide the Scan object with the startkey = prefix? something like: Scan scan = new Scan(prefix); My understanding is that the PrefixFilter does not Seek to the key with Prefix therefore, the Scanner basically start from the beginning of the table and apply the Prefix filter to each key values. From this perspective, the PrefixFilter might be improved by using Hint though.. Best Regards, Jerry On Mon, Oct 15, 2012 at 1:27 PM, J Mohamed Zahoor jmo...@gmail.com wrote: Is this related to HBASE-6757 ? I use a filter list with - prefix filter - filter list of column filters /zahoor On Monday, October 15, 2012, J Mohamed Zahoor wrote: Hi My scanner performance is very slow when using a Prefix filter on a **Encoded Column** ( encoded using FAST_DIFF on both memory and disk). I am using 94.1 hbase. jstack shows that much time is spent on seeking the row. Even if i give a exact row key match in the prefix filter it takes about two minutes to return a single row. Running this multiple times also seems to be redirecting things to disk (loadBlock). at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521) - locked 0x00059584fab8 (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402) - locked 0x00059584fab8 (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3455) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3406) - locked 0x00059589bb30 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3423) If is set the start and end row as same row in scan ... it come in very quick. Saw this link http://search-hadoop.com/m/9f0JH1Kz24U1subj=Re+HBase+0+94+2+SNAPSHOT+Scanning+Bug But it looks like things are fine in 94.1. Any pointers on why this is slow? Note: the row has not many columns(5 and less than a kb) and lots of versions (1500+) ./zahoor
Re: bulk deletes
Hi Anoop: In my use case, I use extensively the version delete marker because I need to delete a specific version of a cell (row key, CF, qualifier, timestamp). I have a mapreduce job that will run across some regions and based on some business rules, some of the cells will be deleted in the table using the version delete marker. The business rules for deletion are scoped to each column family at a time. Therefore, there are no logically dependency of deletions between column families. I also posted the above use case in the HBASE-6942. Best Regards, Jerry On Thu, Oct 11, 2012 at 12:04 AM, Anoop Sam John anoo...@huawei.com wrote: You are right Jerry.. In your use case you want to delete full rows or some cfs/columns only? Pls feel free to see the issue HBASE-6942 and give your valuable comments.. Here I am trying to delete the rows [This is our use case] -Anoop- From: Jerry Lam [chiling...@gmail.com] Sent: Wednesday, October 10, 2012 8:37 PM To: user@hbase.apache.org Subject: Re: bulk deletes Hi guys: The bulk delete approaches described in this thread are helpful in my case as well. If I understood correctly, Paul's approach is useful for offline bulk deletes (a.k.a. mapreduce) whereas Anoop's approach is useful for online/real-time bulk deletes (a.k.a. co-processor)? Best Regards, Jerry On Mon, Oct 8, 2012 at 7:45 AM, Paul Mackles pmack...@adobe.com wrote: Very cool Anoop. I can definitely see how that would be useful. Lars - the bulk deletes do appear to work. I just wasn't sure if there was something I might be missing since I haven't seen this documented elsewhere. Coprocessors do seem a better fit for this in the long term. Thanks everyone. On 10/7/12 11:55 PM, Anoop Sam John anoo...@huawei.com wrote: We also done an implementation using compaction time deletes(avoid KVs). This works very well for us As this would delay the deletes to happen till the next major compaction, we are having an implementation to do the real time bulk delete. [We have such use case] Here I am using an endpoint implementation to do the scan and delete at the server side only. Just raised an IA for this [HBASE-6942]. I will post a patch based on 0.94 model there...Pls have a look I have noticed big performance improvement over the normal way of scan() + delete(ListDelete) as this avoids several network calls and traffic... -Anoop- From: lars hofhansl [lhofha...@yahoo.com] Sent: Saturday, October 06, 2012 1:09 AM To: user@hbase.apache.org Subject: Re: bulk deletes Does it work? :) How did you do the deletes before?I assume you used the HTable.delete(ListDelete) API? (Doesn't really help you, but) In 0.92+ you could hook up a coprocessor into the compactions and simply filter out any KVs you want to have removed. -- Lars From: Paul Mackles pmack...@adobe.com To: user@hbase.apache.org user@hbase.apache.org Sent: Friday, October 5, 2012 11:17 AM Subject: bulk deletes We need to do deletes pretty regularly and sometimes we could have hundreds of millions of cells to delete. TTLs won't work for us because we have a fair amount of bizlogic around the deletes. Given their current implemention (we are on 0.90.4), this delete process can take a really long time (half a day or more with 100 or so concurrent threads). From everything I can tell, the performance issues come down to each delete being an individual RPC call (even when using the batch API). In other words, I don't see any thrashing on hbase while this process is running just lots of waiting for the RPC calls to return. The alternative we came up with is to use the standard bulk load facilities to handle the deletes. The code turned out to be surpisingly simple and appears to work in the small-scale tests we have tried so far. Is anyone else doing deletes in this fashion? Are there drawbacks that I might be missing? Here is a link to the code: https://gist.github.com/3841437 Pretty simple, eh? I haven't seen much mention of this technique which is why I am a tad paranoid about it. Thanks, Paul
Re: Problem with Rest Java Client
Hi Erman: I think this post summed up very well about deletion in hbase ( http://hadoop-hbase.blogspot.ca/2011/12/deletion-in-hbase.html) Please have a look. If you need to delete a specific version of a cell, you can use version delete marker as described in the post. Best Regards, Jerry On Tue, Oct 9, 2012 at 1:24 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Erman, It's normal. At t=1 you insert val1 At t=2 you insert val2 At t=3 you put a marker that row1:farm1:q1 values are deleted. When you try to read the values, HBase will hide all that is before t=3 because of the marker. Which mean you will not see val2 neither you will see val1. I think you can still see them if you read ALL the version for the row. JM 2012/10/9, Erman Pattuk ermanpat...@gmail.com: Hi, I have started using HBase Rest Java client as a part of my project. I see that it may have a problem with the Delete operation. For a given Delete object, if you apply deleteColumn(family, qualifier) on it, all matching qualifiers are deleted instead of the latest value. In order to recreate the problem: 1 - Create table tab1, with family fam1. 2 - Through shell, insert two values, as: row1, fam1, q1, val1 row1, fam1, q1, val2 3 - Through Rest Java client: Delete delItem = new Delete(Bytes.toBytes(row1)); delItem.deleteColumn(Bytes.toBytes(fam1), Bytes.toBytes(q1)); table.delete(delItem); 4 - All q1 values are deleted, instead of the latest q1 value, which is val2. Is that an expected result? Thanks, Erman
Re: bulk deletes
Hi guys: The bulk delete approaches described in this thread are helpful in my case as well. If I understood correctly, Paul's approach is useful for offline bulk deletes (a.k.a. mapreduce) whereas Anoop's approach is useful for online/real-time bulk deletes (a.k.a. co-processor)? Best Regards, Jerry On Mon, Oct 8, 2012 at 7:45 AM, Paul Mackles pmack...@adobe.com wrote: Very cool Anoop. I can definitely see how that would be useful. Lars - the bulk deletes do appear to work. I just wasn't sure if there was something I might be missing since I haven't seen this documented elsewhere. Coprocessors do seem a better fit for this in the long term. Thanks everyone. On 10/7/12 11:55 PM, Anoop Sam John anoo...@huawei.com wrote: We also done an implementation using compaction time deletes(avoid KVs). This works very well for us As this would delay the deletes to happen till the next major compaction, we are having an implementation to do the real time bulk delete. [We have such use case] Here I am using an endpoint implementation to do the scan and delete at the server side only. Just raised an IA for this [HBASE-6942]. I will post a patch based on 0.94 model there...Pls have a look I have noticed big performance improvement over the normal way of scan() + delete(ListDelete) as this avoids several network calls and traffic... -Anoop- From: lars hofhansl [lhofha...@yahoo.com] Sent: Saturday, October 06, 2012 1:09 AM To: user@hbase.apache.org Subject: Re: bulk deletes Does it work? :) How did you do the deletes before?I assume you used the HTable.delete(ListDelete) API? (Doesn't really help you, but) In 0.92+ you could hook up a coprocessor into the compactions and simply filter out any KVs you want to have removed. -- Lars From: Paul Mackles pmack...@adobe.com To: user@hbase.apache.org user@hbase.apache.org Sent: Friday, October 5, 2012 11:17 AM Subject: bulk deletes We need to do deletes pretty regularly and sometimes we could have hundreds of millions of cells to delete. TTLs won't work for us because we have a fair amount of bizlogic around the deletes. Given their current implemention (we are on 0.90.4), this delete process can take a really long time (half a day or more with 100 or so concurrent threads). From everything I can tell, the performance issues come down to each delete being an individual RPC call (even when using the batch API). In other words, I don't see any thrashing on hbase while this process is running just lots of waiting for the RPC calls to return. The alternative we came up with is to use the standard bulk load facilities to handle the deletes. The code turned out to be surpisingly simple and appears to work in the small-scale tests we have tried so far. Is anyone else doing deletes in this fashion? Are there drawbacks that I might be missing? Here is a link to the code: https://gist.github.com/3841437 Pretty simple, eh? I haven't seen much mention of this technique which is why I am a tad paranoid about it. Thanks, Paul
Re: key design
Hi: So you are saying you have ~3TB of data stored per day? Using the second approach, all data for one day will go to only 1 regionserver no matter what you do because HBase doesn't split a single row. Using the first approach, data will spread across regionservers but there will be hotspotted to each regionserver during write since this is a time-series problem. Best Regards, Jerry On Wed, Oct 10, 2012 at 11:24 AM, yutoo yanio yutoo.ya...@gmail.com wrote: hi i have a question about key column design. in my application we have 3,000,000,000 record in every day each record contain : user-id, time stamp, content(max 1KB). we need to store records for one year, this means we will have about 1,000,000,000,000 after 1 year. we just search a user-id over rang of time stamp table can design in two way 1.key=userid-timestamp and column:=content 2.key=userid-MMdd and column:HHmmss=content in first design we have tall-narrow table but we have very very records, in second design we have flat-wide table. which of them have better performance? thanks.
Re: key design
That's true.Then there would be max. 86,400 records per day per userid. That is about 100MB per day. I don't see much difference in both approaches from the storage perspective. On Wed, Oct 10, 2012 at 1:09 PM, Doug Meil doug.m...@explorysmedical.comwrote: Hi there- Given the fact that the userid is in the lead position of the key in both approaches, I'm not sure that he'd have a region hotspotting problem because the userid should be able to offer some spread. On 10/10/12 12:55 PM, Jerry Lam chiling...@gmail.com wrote: Hi: So you are saying you have ~3TB of data stored per day? Using the second approach, all data for one day will go to only 1 regionserver no matter what you do because HBase doesn't split a single row. Using the first approach, data will spread across regionservers but there will be hotspotted to each regionserver during write since this is a time-series problem. Best Regards, Jerry On Wed, Oct 10, 2012 at 11:24 AM, yutoo yanio yutoo.ya...@gmail.com wrote: hi i have a question about key column design. in my application we have 3,000,000,000 record in every day each record contain : user-id, time stamp, content(max 1KB). we need to store records for one year, this means we will have about 1,000,000,000,000 after 1 year. we just search a user-id over rang of time stamp table can design in two way 1.key=userid-timestamp and column:=content 2.key=userid-MMdd and column:HHmmss=content in first design we have tall-narrow table but we have very very records, in second design we have flat-wide table. which of them have better performance? thanks.
Re: HBase Key Design : Doubt
correct me if I'm wrong. The version applies to the individual cell (ie. row key, column family and column qualifier) not (row key, column family). On Wed, Oct 10, 2012 at 3:13 PM, Narayanan K knarayana...@gmail.com wrote: Hi all, I have a usecase wherein I need to find the unique of some things in HBase across dates. Say, on 1st Oct, A-B-C-D appeared, hence I insert a row with rowkey : A-B-C-D. On 2nd Oct, I get the same value A-B-C-D and I don't want to redundantly store the row again with a new rowkey - A-B-C-D for 2nd Oct i.e I will not want to have 20121001-A-B-C-D and 20121002-A-B-C-D as 2 rowkeys in the table. Eg: If I have 1st Oct , 2nd Oct as 2 column families and if number of versions are set to 1, only 1 row will be present in for both the dates having rowkey A-B-C-D. Hence if I need to find unique number of times A-B-C-D appeared during Oct 1 and Oct 2, I just need to take rowcount of the row A-B-C-D by filtering over the 2 column families. Similarly, if we have 10 date column families, and I need to scan only for 2 dates, then it scans only those store files having the specified column families. This will make scanning faster. But here the design problem is that I cant add more column families to the table each day. I would need to store data every day and I read that HBase doesnt work well with more than 3 column families. The other option is to have one single column family and store dates as qualifiers : date:d1, date:d2 But here if there are 30 date qualifiers under date column family, to scan a single date qualifier or may be range of 2-3 dates will have to scan through the entire data of all d1 to d30 qualifiers in the date column family which would be slower compared to having separate column families for the each date.. Please share your thoughts on this. Also any alternate design suggestions you might have. Regards, Narayanan
Re: How to specify empty value in HBase shell
Hi St.Ack: I made some dirty changes to the script yesterday to work for me. Basically, I changed the parse_column_name(column) function to: def parse_column_name(column) split = org.apache.hadoop.hbase.KeyValue.parseColumn(column.to_java_bytes) return split[0], (split.length 1) ? split[1] : (org.apache.hadoop.hbase.KeyValue.getDelimiter(column.to_java_bytes, 0, column.to_java_bytes.length(), 58) 0) ? ''.to_java_bytes : nil end Not sure if it makes sense as the general solution to the problem but at least it seems to do the job. The end result is that, if user specify COLUMNS without the delimiter, it is treated as column family without qualifier. If there is delimiter but the split has only 1 element, then the column qualifier is set to empty value. Best Regards, Jerry On Fri, Sep 21, 2012 at 12:42 AM, Stack st...@duboce.net wrote: On Thu, Sep 20, 2012 at 7:31 AM, Jerry Lam chiling...@gmail.com wrote: Hi HBase Community: I have been struggling to find a way to specify empty value/empty column qualifier in the hbase shell, but unsuccessful. I google it, nothing comes up. I don't know JRuby so that might be why. Do you know how? Example: scan 'Table', {COLUMNS = 'cf:'} // note that the column family is cf and the column qualifier is empty (i.e. new byte[0]) The above query will return all columns instead of the empty one. Sounds like no qualifier means all columns to shell. Do you have to use the 'empty qualifier'? Thats a bit odd. You really need it in your model? In the shell we are doing this: columns.each do |c| family, qualifier = parse_column_name(c.to_s) if qualifier scan.addColumn(family, qualifier) else scan.addFamily(family) end end If no qualifier, we think its a scan of the family. I don't really have a good answer for you. In shell, what would you suggest we add so we do addColumn rather than addFamily if qualifier is empty? St.Ack
How to specify empty value in HBase shell
Hi HBase Community: I have been struggling to find a way to specify empty value/empty column qualifier in the hbase shell, but unsuccessful. I google it, nothing comes up. I don't know JRuby so that might be why. Do you know how? Example: scan 'Table', {COLUMNS = 'cf:'} // note that the column family is cf and the column qualifier is empty (i.e. new byte[0]) The above query will return all columns instead of the empty one. I need only the values associated with the empty column qualifier. Please help ~ Jerry
Re: Undelete Rows
Hi Alex: we have a functionality which allows users to delete the data stored in hbase but once in awhile, users can call us to undelete certain data that have been deleted a hour/day ago. Since we run major compaction weekly, my wishful thinking was that the data is still there and can be recovered if we can just delete the delete marker. It seems it is not as easy as I initially thought after reading the replies. Lars suggestion requires read the deleted data and write it back which can be expensive. Best Regards, Jerry On Wed, Sep 19, 2012 at 10:07 AM, Alex Baranau alex.barano...@gmail.comwrote: Hi Jerry, Just out of the curiosity: what is your use-case? Why do you want to do that? To gain extra protection from software error or smth else? Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Tue, Sep 18, 2012 at 6:32 PM, lars hofhansl lhofha...@yahoo.com wrote: Can't do it (without low level massing HFiles, which you do not want to do). The best you can do (if you have HBASE-4536 and enabled KEEP_DELETED_CELLS for you column family) is to read the deleted rows back and write them again with a newer TS. -- Lars From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org Sent: Tuesday, September 18, 2012 3:06 PM Subject: Undelete Rows Hi HBase Community: I wonder if it is possible to undelete the rows that have been marked for deletion before the major compaction kicks in? I read about HBASE-4536 but I'm not sure if this can effectively remove the tombstone marker? Any input is appreciated. Best Regards, Jerry
Re: HBase aggregate query
Hi Prabhjot: Can you implement this using a counter? That is whenever you insert a row with the month(eventdate) and scene combination, increment the associated counter by one. Note that if you have a batch insert of N, you can increment the counter by N. Then you can simply query the counter whenever you want the aggregated result. HTH, Jerry On Tue, Sep 11, 2012 at 1:59 PM, lars hofhansl lhofha...@yahoo.com wrote: That's when you aggregate along a sorted dimension (prefix of the key), though. Right? Not sure how smart Hive is here, but if it needs to sort the data it will probably be slower than SQL Server for such a small data set. - Original Message - From: James Taylor jtay...@salesforce.com To: user@hbase.apache.org Cc: Sent: Monday, September 10, 2012 5:49 PM Subject: Re: HBase aggregate query iwannaplay games funnlearnforkids@... writes: Hi , I want to run query like select month(eventdate),scene,count(1),sum(timespent) from eventlog group by month(eventdate),scene in hbase.Through hive its taking a lot of time for 40 million records.Do we have any syntax in hbase to find its result?In sql server it takes around 9 minutes,How long it might take in hbase?? Regards Prabhjot Hi, In our internal testing using server-side coprocessors for aggregation, we've found HBase can process these types of queries very quickly: ~10-12 seconds using a four node cluster. You need to chunk up and parallelize the work on the client side to get this kind of performance, though. Regards, James
Re: Reading in parallel from table's regions in MapReduce
Hi Loakim: Sorry, your hypothesis doesn't make sense. I would suggest you to read the Learning HBase Internals by Lars Hofhansl at http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final to understand how HBase locking works. Regarding to the issue you are facing, are you sure you configure the job properly (i.e. requesting the jobtracker to have more than 1 mapper to execute)? If you are testing on a single machine, you properly need to configure the number of tasktracker per node as well to see more than 1 mapper to execute on a single machine. my $0.02 Jerry On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
Re: Reading in parallel from table's regions in MapReduce
Hi Loakim: Here a list of links I would suggest you to read (I know it is a lot to read): HBase Related: - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description - make sure to read the examples: http://hbase.apache.org/book/mapreduce.example.html Hadoop Related: - http://wiki.apache.org/hadoop/JobTracker - http://wiki.apache.org/hadoop/TaskTracker - http://hadoop.apache.org/common/docs/r1.0.3/mapred_tutorial.html - Some Configurations: http://hadoop.apache.org/common/docs/r1.0.3/cluster_setup.html HTH, Jerry On Tue, Sep 4, 2012 at 12:41 PM, Michael Segel michael_se...@hotmail.comwrote: I think the issue is that you are misinterpreting what you are seeing and what Doug was trying to tell you... The short simple answer is that you're getting one split per region. Each split is assigned to a specific mapper task and that task will sequentially walk through the table finding the rows that match your scan request. There is no lock or blocking. I think you really should actually read Lars George's book on HBase to get a better understanding. HTH -Mike On Sep 4, 2012, at 11:29 AM, Ioakim Perros imper...@gmail.com wrote: Thank you very much for your response and for the excellent reference. The thing is that I am running jobs on a distributed environment and beyond the TableMapReduceUtil settings, I have just set the scan ' s caching to the number of rows I expect to retrieve at each map task, and the scan's caching blocks feature to false (just as it is indicated at MapReduce examples of HBase's homepage). I am not aware of such a job configuration (requesting jobtracker to execute more than 1 map tasks concurrently). Any other ideas? Thank you again and regards, ioakim On 09/04/2012 06:59 PM, Jerry Lam wrote: Hi Loakim: Sorry, your hypothesis doesn't make sense. I would suggest you to read the Learning HBase Internals by Lars Hofhansl at http://www.slideshare.net/cloudera/3-learning-h-base-internals-lars-hofhansl-salesforce-final to understand how HBase locking works. Regarding to the issue you are facing, are you sure you configure the job properly (i.e. requesting the jobtracker to have more than 1 mapper to execute)? If you are testing on a single machine, you properly need to configure the number of tasktracker per node as well to see more than 1 mapper to execute on a single machine. my $0.02 Jerry On Tue, Sep 4, 2012 at 11:17 AM, Ioakim Perros imper...@gmail.com wrote: Hello, I would be grateful if someone could shed a light to the following: Each M/R map task is reading data from a separate region of a table. From the jobtracker 's GUI, at the map completion graph, I notice that although data read from mappers are different, they read data sequentially - like the table has a lock that permits only one mapper to read data from every region at a time. Does this lock hypothesis make sense? Is there any way I could avoid this useless delay? Thanks in advance and regards, Ioakim
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Lars: Thanks for spending time discussing this with me. I appreciate it. I tried to implement the setMaxVersions(1) inside the filter as follows: @Override public ReturnCode filterKeyValue(KeyValue kv) { // check if the same qualifier as the one that has been included previously. If yes, jump to next column if (previousIncludedQualifier != null Bytes.compareTo(previousIncludedQualifier,kv.getQualifier()) == 0) { previousIncludedQualifier = null; return ReturnCode.NEXT_COL; } // another condition that makes the jump further using HINT if (Bytes.compareTo(this.qualifier, kv.getQualifier()) == 0) { LOG.info(Matched Found.); return ReturnCode.SEEK_NEXT_USING_HINT; } // include this to the result and keep track of the included qualifier so the next version of the same qualifier will be excluded previousIncludedQualifier = kv.getQualifier(); return ReturnCode.INCLUDE; } Does this look reasonable or there is a better way to achieve this? It would be nice to have ReturnCode.INCLUDE_AND_NEXT_COL for this case though. Best Regards, Jerry On Wed, Aug 29, 2012 at 2:09 AM, lars hofhansl lhofha...@yahoo.com wrote: Hi Jerry, my answer will be the same again: Some folks will want the max versions set by the client to be before filters and some folks will want it to restrict the end result. It's not possible to have it both ways. Your filter needs to do the right thing. There's a lot of discussion around this in HBASE-5104. -- Lars From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Sent: Tuesday, August 28, 2012 1:52 PM Subject: Re: setTimeRange and setMaxVersions seem to be inefficient Hi Lars: I see. Please refer to the inline comment below. Best Regards, Jerry On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl lhofha...@yahoo.com wrote: What I was saying was: It depends. :) First off, how do you get to 1000 versions? In 0.94++ older version are pruned upon flush, so you need 333 flushes (assuming 3 versions on the CF) to get 1000 versions. I forgot that the default number of version to keep is 3. If this is what people use most of the time, yes you are right for this type of scenarios where the number of version per column to keep is small. By that time some compactions will have happened and you're back to close to 3 versions (maybe 9, 12, or 15 or so, depending on how store files you have). Now, if you have that many version because because you set VERSIONS=1000 in your CF... Then imagine you have 100 columns with 1000 versions each. Yes, imagine I set VERSIONS = Long.MAX_VALUE (i.e. I will manage the versioning myself) In your scenario below you'd do 10 comparisons if the filter would be evaluated after the version counting. But only 1100 with the current code. (or at least in that ball park) This is where I don't quite understand what you mean. if the framework counts the number of ReturnCode.INCLUDE and then stops feeding the KeyValue into the filterKeyValue method after it reaches the count specified in setMaxVersions (i.e. 1 for the case we discussed), should then be just 100 comparisons only (at most) instead of 1100 comparisons? Maybe I don't understand how the current way is doing... The gist is: One can construct scenarios where one approach is better than the other. Only one order is possible. If you write a custom filter and you care about these things you should use the seek hints. -- Lars - Original Message - From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Cc: Sent: Tuesday, August 28, 2012 7:17 AM Subject: Re: setTimeRange and setMaxVersions seem to be inefficient Hi Lars: Thanks for the reply. I need to understand if I misunderstood the perceived inefficiency because it seems you don't think quite the same. Let say, as an example, we have 1 row with 2 columns (col-1 and col-2) in a table and each column has 1000 versions. Using the following code (the code might have errors and don't compile): /** * This is very simple use case of a ColumnPrefixFilter. * In fact all other filters that make use of filterKeyValue will see similar * performance problems that I have concerned with when the number of * versions per column could be huge. Filter filter = new ColumnPrefixFilter(Bytes.toBytes(col-2)); Scan scan = new Scan(); scan.setFilter(filter); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { for (KeyValue kv : result.raw()) { System.out.println(KV: + kv + , Value: + Bytes.toString(kv.getValue())); } } scanner.close(); */ Implicitly, the number of version per column that is going to return is 1 (the latest version). User might expect that only 2 comparisons for column prefix are needed (1 for col-1
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Ted: Sure, will do. I also implement the reset method to set previousIncludedQualifier to null for the next row to come. Best Regards, Jerry On Wed, Aug 29, 2012 at 1:47 PM, Ted Yu yuzhih...@gmail.com wrote: Jerry: Remember to also implement: + @Override + public KeyValue getNextKeyHint(KeyValue currentKV) { You can log a JIRA for supporting ReturnCode.INCLUDE_AND_NEXT_COL. Cheers On Wed, Aug 29, 2012 at 6:59 AM, Jerry Lam chiling...@gmail.com wrote: Hi Lars: Thanks for spending time discussing this with me. I appreciate it. I tried to implement the setMaxVersions(1) inside the filter as follows: @Override public ReturnCode filterKeyValue(KeyValue kv) { // check if the same qualifier as the one that has been included previously. If yes, jump to next column if (previousIncludedQualifier != null Bytes.compareTo(previousIncludedQualifier,kv.getQualifier()) == 0) { previousIncludedQualifier = null; return ReturnCode.NEXT_COL; } // another condition that makes the jump further using HINT if (Bytes.compareTo(this.qualifier, kv.getQualifier()) == 0) { LOG.info(Matched Found.); return ReturnCode.SEEK_NEXT_USING_HINT; } // include this to the result and keep track of the included qualifier so the next version of the same qualifier will be excluded previousIncludedQualifier = kv.getQualifier(); return ReturnCode.INCLUDE; } Does this look reasonable or there is a better way to achieve this? It would be nice to have ReturnCode.INCLUDE_AND_NEXT_COL for this case though. Best Regards, Jerry On Wed, Aug 29, 2012 at 2:09 AM, lars hofhansl lhofha...@yahoo.com wrote: Hi Jerry, my answer will be the same again: Some folks will want the max versions set by the client to be before filters and some folks will want it to restrict the end result. It's not possible to have it both ways. Your filter needs to do the right thing. There's a lot of discussion around this in HBASE-5104. -- Lars From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Sent: Tuesday, August 28, 2012 1:52 PM Subject: Re: setTimeRange and setMaxVersions seem to be inefficient Hi Lars: I see. Please refer to the inline comment below. Best Regards, Jerry On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl lhofha...@yahoo.com wrote: What I was saying was: It depends. :) First off, how do you get to 1000 versions? In 0.94++ older version are pruned upon flush, so you need 333 flushes (assuming 3 versions on the CF) to get 1000 versions. I forgot that the default number of version to keep is 3. If this is what people use most of the time, yes you are right for this type of scenarios where the number of version per column to keep is small. By that time some compactions will have happened and you're back to close to 3 versions (maybe 9, 12, or 15 or so, depending on how store files you have). Now, if you have that many version because because you set VERSIONS=1000 in your CF... Then imagine you have 100 columns with 1000 versions each. Yes, imagine I set VERSIONS = Long.MAX_VALUE (i.e. I will manage the versioning myself) In your scenario below you'd do 10 comparisons if the filter would be evaluated after the version counting. But only 1100 with the current code. (or at least in that ball park) This is where I don't quite understand what you mean. if the framework counts the number of ReturnCode.INCLUDE and then stops feeding the KeyValue into the filterKeyValue method after it reaches the count specified in setMaxVersions (i.e. 1 for the case we discussed), should then be just 100 comparisons only (at most) instead of 1100 comparisons? Maybe I don't understand how the current way is doing... The gist is: One can construct scenarios where one approach is better than the other. Only one order is possible. If you write a custom filter and you care about these things you should use the seek hints. -- Lars - Original Message - From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Cc: Sent: Tuesday, August 28, 2012 7:17 AM Subject: Re: setTimeRange and setMaxVersions seem to be inefficient Hi Lars: Thanks for the reply. I need to understand if I misunderstood the perceived inefficiency because it seems you don't think quite the same. Let say, as an example, we have 1 row with 2 columns (col-1 and col-2) in a table and each column has 1000 versions. Using the following code (the code might have errors and don't compile): /** * This is very simple use case
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Lars: Thanks for the reply. I need to understand if I misunderstood the perceived inefficiency because it seems you don't think quite the same. Let say, as an example, we have 1 row with 2 columns (col-1 and col-2) in a table and each column has 1000 versions. Using the following code (the code might have errors and don't compile): /** * This is very simple use case of a ColumnPrefixFilter. * In fact all other filters that make use of filterKeyValue will see similar * performance problems that I have concerned with when the number of * versions per column could be huge. Filter filter = new ColumnPrefixFilter(Bytes.toBytes(col-2)); Scan scan = new Scan(); scan.setFilter(filter); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { for (KeyValue kv : result.raw()) { System.out.println(KV: + kv + , Value: + Bytes.toString(kv.getValue())); } } scanner.close(); */ Implicitly, the number of version per column that is going to return is 1 (the latest version). User might expect that only 2 comparisons for column prefix are needed (1 for col-1 and 1 for col-2) but in fact, it processes the filterKeyValue method in ColumnPrefixFilter 1000 times (1 for col-1 and 1000 for col-2) for col-2 (1 per version) because all versions of the column have the same prefix for obvious reason. For col-1, it will skip using SEEK_NEXT_USING_HINT which should skip the 99 versions of col-1. In summary, the 1000 comparisons (5000 byte comparisons) for the column prefix col-2 is wasted because only 1 version is returned to user. Also, I believe this inefficiency is hidden from the user code but it affects all filters that use filterKeyValue as the main execution for filtering KVs. Do we have a case to improve HBase to handle this inefficiency? :) It seems valid unless you prove otherwise. Best Regards, Jerry On Tue, Aug 28, 2012 at 12:54 AM, lars hofhansl lhofha...@yahoo.com wrote: First off regarding inefficiency... If version counting would happen first and then filter were executed we'd have folks complaining about inefficiencies as well: (Why does the code have to go through the versioning stuff when my filter filters the row/column/version anyway?) ;-) For your problem, you want to make use of seek hints... In addition to INCLUDE you can return NEXT_COL, NEXT_ROW, or even SEEK_NEXT_USING_HINT from Filter.filterKeyValue(...). That way the scanning framework will know to skip ahead to the next column, row, or a KV of your choosing. (see Filter.filterKeyValue and Filter.getNextKeyHint). (as an aside, it would probably be nice if Filters also had INCLUDE_AND_NEXT_COL, INCLUDE_AND_NEXT_ROW, internally used by StoreScanner) Have a look at ColumnPrefixFilter as an example. I also wrote a short post here: http://hadoop-hbase.blogspot.com/2012/01/filters-in-hbase-or-intra-row-scanning.html Does that help? -- Lars - Original Message - From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Monday, August 27, 2012 5:59 PM Subject: Re: setTimeRange and setMaxVersions seem to be inefficient Hi Lars: Thanks for confirming the inefficiency of the implementation for this case. For my case, a column can have more than 10K versions, I need a quick way to stop the scan from digging the column once there is a match (ReturnCode.INCLUDE). It would be nice to have a ReturnCode that can notify the framework to stop and go to next column once the number of versions specify in setMaxVersions is met. For now, I guess I have to hack it in the custom filter (I.e. I keep the count myself)? If you have a better way to achieve this, please share :) Best Regards, Jerry Sent from my iPad (sorry for spelling mistakes) On 2012-08-27, at 20:11, lars hofhansl lhofha...@yahoo.com wrote: Currently filters are evaluated before we do version counting. Here's a comment from ScanQueryMatcher.java: /** * Filters should be checked before checking column trackers. If we do * otherwise, as was previously being done, ColumnTracker may increment its * counter for even that KV which may be discarded later on by Filter. This * would lead to incorrect results in certain cases. */ So this is by design. (Doesn't mean it's correct or desirable, though.) -- Lars - Original Message - From: Jerry Lam chiling...@gmail.com To: user user@hbase.apache.org Cc: Sent: Monday, August 27, 2012 2:40 PM Subject: setTimeRange and setMaxVersions seem to be inefficient Hi HBase community: I tried to use setTimeRange and setMaxVersions to limit the number of KVs return per column. The behaviour is as I would expect that is setTimeRange(0, T + 1) and setMaxVersions(1) will give me ONE version of KV with timestamp that is less than or equal to T. However, I noticed that all versions
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Lars: I see. Please refer to the inline comment below. Best Regards, Jerry On Tue, Aug 28, 2012 at 2:21 PM, lars hofhansl lhofha...@yahoo.com wrote: What I was saying was: It depends. :) First off, how do you get to 1000 versions? In 0.94++ older version are pruned upon flush, so you need 333 flushes (assuming 3 versions on the CF) to get 1000 versions. I forgot that the default number of version to keep is 3. If this is what people use most of the time, yes you are right for this type of scenarios where the number of version per column to keep is small. By that time some compactions will have happened and you're back to close to 3 versions (maybe 9, 12, or 15 or so, depending on how store files you have). Now, if you have that many version because because you set VERSIONS=1000 in your CF... Then imagine you have 100 columns with 1000 versions each. Yes, imagine I set VERSIONS = Long.MAX_VALUE (i.e. I will manage the versioning myself) In your scenario below you'd do 10 comparisons if the filter would be evaluated after the version counting. But only 1100 with the current code. (or at least in that ball park) This is where I don't quite understand what you mean. if the framework counts the number of ReturnCode.INCLUDE and then stops feeding the KeyValue into the filterKeyValue method after it reaches the count specified in setMaxVersions (i.e. 1 for the case we discussed), should then be just 100 comparisons only (at most) instead of 1100 comparisons? Maybe I don't understand how the current way is doing... The gist is: One can construct scenarios where one approach is better than the other. Only one order is possible. If you write a custom filter and you care about these things you should use the seek hints. -- Lars - Original Message - From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com Cc: Sent: Tuesday, August 28, 2012 7:17 AM Subject: Re: setTimeRange and setMaxVersions seem to be inefficient Hi Lars: Thanks for the reply. I need to understand if I misunderstood the perceived inefficiency because it seems you don't think quite the same. Let say, as an example, we have 1 row with 2 columns (col-1 and col-2) in a table and each column has 1000 versions. Using the following code (the code might have errors and don't compile): /** * This is very simple use case of a ColumnPrefixFilter. * In fact all other filters that make use of filterKeyValue will see similar * performance problems that I have concerned with when the number of * versions per column could be huge. Filter filter = new ColumnPrefixFilter(Bytes.toBytes(col-2)); Scan scan = new Scan(); scan.setFilter(filter); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { for (KeyValue kv : result.raw()) { System.out.println(KV: + kv + , Value: + Bytes.toString(kv.getValue())); } } scanner.close(); */ Implicitly, the number of version per column that is going to return is 1 (the latest version). User might expect that only 2 comparisons for column prefix are needed (1 for col-1 and 1 for col-2) but in fact, it processes the filterKeyValue method in ColumnPrefixFilter 1000 times (1 for col-1 and 1000 for col-2) for col-2 (1 per version) because all versions of the column have the same prefix for obvious reason. For col-1, it will skip using SEEK_NEXT_USING_HINT which should skip the 99 versions of col-1. In summary, the 1000 comparisons (5000 byte comparisons) for the column prefix col-2 is wasted because only 1 version is returned to user. Also, I believe this inefficiency is hidden from the user code but it affects all filters that use filterKeyValue as the main execution for filtering KVs. Do we have a case to improve HBase to handle this inefficiency? :) It seems valid unless you prove otherwise. Best Regards, Jerry On Tue, Aug 28, 2012 at 12:54 AM, lars hofhansl lhofha...@yahoo.com wrote: First off regarding inefficiency... If version counting would happen first and then filter were executed we'd have folks complaining about inefficiencies as well: (Why does the code have to go through the versioning stuff when my filter filters the row/column/version anyway?) ;-) For your problem, you want to make use of seek hints... In addition to INCLUDE you can return NEXT_COL, NEXT_ROW, or even SEEK_NEXT_USING_HINT from Filter.filterKeyValue(...). That way the scanning framework will know to skip ahead to the next column, row, or a KV of your choosing. (see Filter.filterKeyValue and Filter.getNextKeyHint). (as an aside, it would probably be nice if Filters also had INCLUDE_AND_NEXT_COL, INCLUDE_AND_NEXT_ROW, internally used by StoreScanner) Have a look at ColumnPrefixFilter as an example. I also wrote a short post here: http://hadoop-hbase.blogspot.com/2012/01/filters-in-hbase-or-intra-row-scanning.html
Re: Column Value Reference Timestamp Filter
Hi Alex: We decided to use setTimeRange and setMaxVersions, and remove the column with a reference timestamp (i.e. we don't put this column into hbase anymore). This behavior is what we would like but it seems very inefficient because all versions are processed before the setMaxVersions takes effect (I just posted some new findings in another post). Best Regards, Jerry On Mon, Aug 20, 2012 at 4:47 PM, Alex Baranau alex.barano...@gmail.comwrote: Hi, So, you have row with key rowKeyA and column col1. And it contains two values value1 and value2 at timestamp1 and timestamp2 respectively, where timestamp1 is most recent. And you want to fetch most recent but one values in all columns when doing the scan. I.e. you don't know the timestamp1 or timestamp2 exactly you just need to fetch the value which was placed before the most recent one. Is that correct? Don't think there's some filter that would allow you to do so out-of-the-box. You should probably be able to write such filter and use scan.setMaxVersions(2). Not sure if keyvalues are fed into filter ordered by their timestamp.. How about returning 2 most recent values to the client and filtering on the client-side? Why this doesn't work in your case? (large values in columns in size or?). Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Mon, Aug 20, 2012 at 2:57 PM, Jerry Lam chiling...@gmail.com wrote: Hi HBase community: I have a requirement in which I need to query a row based on the timestamp stored in the value of a column of a row. For example. (rowkeyA of col1) - (value) at timestamp = t1, (value) stores t2. Result should return all columns of rowkeyA at timestamp = t2. Note that t1 t2 ALWAYS. Can this sound like something that can be done using Filter? If yes, can it be done using the existing filters in HBase without customization? Best Regards, Jerry
Re: setTimeRange and setMaxVersions seem to be inefficient
Hi Lars: Thanks for confirming the inefficiency of the implementation for this case. For my case, a column can have more than 10K versions, I need a quick way to stop the scan from digging the column once there is a match (ReturnCode.INCLUDE). It would be nice to have a ReturnCode that can notify the framework to stop and go to next column once the number of versions specify in setMaxVersions is met. For now, I guess I have to hack it in the custom filter (I.e. I keep the count myself)? If you have a better way to achieve this, please share :) Best Regards, Jerry Sent from my iPad (sorry for spelling mistakes) On 2012-08-27, at 20:11, lars hofhansl lhofha...@yahoo.com wrote: Currently filters are evaluated before we do version counting. Here's a comment from ScanQueryMatcher.java: /** * Filters should be checked before checking column trackers. If we do * otherwise, as was previously being done, ColumnTracker may increment its * counter for even that KV which may be discarded later on by Filter. This * would lead to incorrect results in certain cases. */ So this is by design. (Doesn't mean it's correct or desirable, though.) -- Lars - Original Message - From: Jerry Lam chiling...@gmail.com To: user user@hbase.apache.org Cc: Sent: Monday, August 27, 2012 2:40 PM Subject: setTimeRange and setMaxVersions seem to be inefficient Hi HBase community: I tried to use setTimeRange and setMaxVersions to limit the number of KVs return per column. The behaviour is as I would expect that is setTimeRange(0, T + 1) and setMaxVersions(1) will give me ONE version of KV with timestamp that is less than or equal to T. However, I noticed that all versions of the KeyValue for a particular column are processed through a custom filter I implemented even though I specify setMaxVersions(1) and setTimeRange(0, T+1). I expected that if ONE KV of a particular column has ReturnCode.INCLUDE, the framework will jump to the next COL instead of iterating through all versions of the column. Can someone confirm me if this is the expected behaviour (iterating through all versions of a column before setMaxVersions take effect)? If this is an expected behaviour, what is your recommendation to speed this up? Best Regards, Jerry
Column Value Reference Timestamp Filter
Hi HBase community: I have a requirement in which I need to query a row based on the timestamp stored in the value of a column of a row. For example. (rowkeyA of col1) - (value) at timestamp = t1, (value) stores t2. Result should return all columns of rowkeyA at timestamp = t2. Note that t1 t2 ALWAYS. Can this sound like something that can be done using Filter? If yes, can it be done using the existing filters in HBase without customization? Best Regards, Jerry
Re: Disk space usage of HFilev1 vs HFilev2
Hi Anil: Maybe you can try to compare the two HFile implementation directly? Let say write 1000 rows into HFile v1 format and then into HFile v2 format. You can then compare the size of the two directly? HTH, Jerry On Tue, Aug 14, 2012 at 3:36 PM, anil gupta anilgupt...@gmail.com wrote: Hi Zahoor, Then it seems like i might have missed something when doing hdfs usage estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME for getting the hdfs usage of a table. Is this the right way? Since i wiped of the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is it possible to store a table in HFileV1 instead of HFileV2 in HBase0.92? In this way i can do a fair comparison. Thanks, Anil Gupta On Tue, Aug 14, 2012 at 12:13 PM, jmozah jmo...@gmail.com wrote: Hi Anil, I really doubt that there is 50% drop in file sizes... As far as i know.. there is no drastic space conserving feature in V2. Just as an after thought.. do a major compact and check the sizes. ./Zahoor http://blog.zahoor.in On 15-Aug-2012, at 12:31 AM, anil gupta anilgupt...@gmail.com wrote: l -- Thanks Regards, Anil Gupta
Re: multitable query
Hi Wei: There is a jira Hbase-3996, does this sound something you are looking for? Regards, Jerry On Friday, August 10, 2012, Bryan Beaudreault wrote: Use 3 jobs: 1 to scan each table. The third could do a map-side join. Make sure to use the same sort and partitions on the first two. Sent from iPhone. On Aug 10, 2012, at 9:41 AM, Weishung Chung weish...@gmail.comjavascript:; wrote: but they are in production now On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung weish...@gmail.comjavascript:; wrote: Thank you, I am trying to avoid to fetch by gets and would like to do something like hadoop MultipleInputs. Yes, it would be nice if i could denormalize and remodel the schema. On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana ama...@gmail.comjavascript:; wrote: You can scan over one of the tables (using TableInputFormat) and do simple gets on the other table for every row that you want to join. An interesting question to address here would be - why even need a join. Can you talk more about the data and what you are trying to do? In general you really want to denormalize and not need joins when working with HBase (or for that matter most NoSQL stores). On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung weish...@gmail.comjavascript:; wrote: Basically a join of two data sets on the same row key. On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana ama...@gmail.comjavascript:; wrote: How do you want to use two tables? Can you explain your algo a bit? On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung weish...@gmail.comjavascript:; wrote: Hi HBase users, I need to pull data from 2 HBase tables in a mapreduce job. For 1 table input, I use TableMapReduceUtil.initTableMapperJob. Is there another method for multitable inputs ? Thank you, Wei Shung
Re: CheckAndAppend Feature
Hi Lars: This helps a lot! Thanks! Best Regards, Jerry Sent from my iPad (sorry for spelling mistakes) On 2012-08-07, at 20:30, lars hofhansl lhofha...@yahoo.com wrote: I filed HBASE-6522. It is a trivial change to make locks and leases available to coprocessors. So checkAndSet type operations can then be implemented via coprocessor endpoints: lock row, check, fail or update, unlock row. Since the patch is so simple I'll commit that soon (to 0.94.2 and 0.96) -- Lars From: lars hofhansl lhofha...@yahoo.com To: user@hbase.apache.org user@hbase.apache.org Sent: Tuesday, August 7, 2012 8:55 AM Subject: Re: CheckAndAppend Feature There is no such functionality currently, and there is no good way to simulate that. Currently that cannot even be done with a coprocessor endpoint, because region coprocessors have no way to create a region lock (just checked the code). (That is something we have to change I think - I will create an issue once the Jira system is back from the walk in the park). -- Lars - Original Message - From: Jerry Lam chiling...@gmail.com To: user user@hbase.apache.org Cc: Sent: Tuesday, August 7, 2012 8:22 AM Subject: CheckAndAppend Feature Hi HBase community: I checked the HTable API, it has checkAndPut and checkAndDelete but I'm looking for checkAndAppend. Is there a way to simulate similarly? For instance, I want to check the last 32 bytes of a value (let assume that it has 128 bytes in total) in a column before appending atomically some values into it. Thanks! Jerry
Re: HBaseTestingUtility on windows
Hi Mohit: You might need to install Cygwin if the tool has dependency on Linux command like bash. Best Regards, Jerry On Friday, August 3, 2012, N Keywal wrote: Hi Mohit, For simple cases, it works for me for hbase 0.94 at least. But I'm not sure it works for all features. I've never tried to run hbase unit tests on windows for example. N. On Fri, Aug 3, 2012 at 6:01 AM, Mohit Anchlia mohitanch...@gmail.comjavascript:; wrote: I am trying to run mini cluster using HBaseTestingUtility Class from hbase tests on windows, but I get bash command error. Is it not possible to run this utility class on windows? I followed this example: http://blog.sematext.com/2010/08/30/hbase-case-study-using-hbasetestingutility-for-local-testing-development/
Re: Filter with State
Hi Lars: That is useful. I appreciate it. The idea about cross row transaction is an interesting one. Can I have an iterator on the client side that get rows from a coprocessor? (i.e. Filtered rows are streamed into the client application and client can access them via iterator) Best Regards, Jerry On Thu, Aug 2, 2012 at 12:13 AM, lars hofhansl lhofha...@yahoo.com wrote: The Filter is initialized per Region as part of a RegionScannerImpl. So as long as all the rows you are interested are co-located in the same region you can keep that state in the Filter instance. You can use a custom RegionSplitPolicy to control (to some extend at least) how the rows are colocated (KeyPrefixRegionSplitPolicy is an example). I also blogged about this here (in the context of cross row transactions): http://hadoop-hbase.blogspot.com/2012/02/limited-cross-row-transactions-in-hbase.html Maybe what you really are looking for are coprocessors? -- Lars - Original Message - From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Wednesday, August 1, 2012 7:06 PM Subject: Re: Filter with State Hi Lars, I understand that it is more difficult to carry states across regions/servers, how about in a single region? Knowing that the rows in a single region have dependencies, can we have filter with state? If filter doesn't provide this ability, is there other mechanism in hbase to offer this kind of functionalities? I think this is a good feature because it allows efficient scanning on dependent rows. Instead of fetching each row to the client side and check if we should fetch the next row, the filter on the server side handles this logic. Best Regards, Jerry Sent from my iPad (sorry for spelling mistakes) On 2012-08-01, at 21:52, lars hofhansl lhofha...@yahoo.com wrote: The issue here is that different rows can be located in different regions or even different region servers, so no local state will carry over all rows. - Original Message - From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 1, 2012 5:48 PM Subject: Re: Filter with State Hi St.Ack: Schema cannot be changed to a single row. The API describes Do not rely on filters carrying state across rows; its not reliable in current hbase as we have no handlers in place for when regions split, close or server crashes. If we manage region splitting ourselves, so the split issue doesn't apply. Other failures can be handled on the application level. Does each invocation of scanner.next instantiate a new filter at the server side even on the same region (I.e. Does scanning on the same region use the same filter or different filter depending on the scanner.next calls??) Best Regards, Jerry Sent from my iPad (sorry for spelling mistakes) On 2012-08-01, at 18:44, Stack st...@duboce.net wrote: On Wed, Aug 1, 2012 at 10:44 PM, Jerry Lam chiling...@gmail.com wrote: Hi HBase guru: From Lars George talk, he mentions that filter has no state. What if I need to scan rows in which the decision to filter one row or not is based on the previous row's column values? Any idea how one can implement this type of logic? You could try carrying state in the client (but if client dies state dies). You can't have scanners carry state across rows. It says so in API http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description (Whatever about the API, if LarsG says it, it must be so!). Here is the issue: If row X is in region A on server 1 there is nothing to prevent row X+1 from being on region B on server 2. How do you carry the state between such rows reliably? Can you redo your schema such that the state you need to carry remains within a row? St.Ack
Re: sync on writes
I believe you are talking about enabling dfs.support.append feature? I benchmarked the difference (disable/enable) previously and I don't find much differences. It would be great if someone else can confirm on this. Best Regards, Jerry On Wednesday, August 1, 2012, Alex Baranau wrote: I believe that this is *not default*, but *current* implementation of sync(). I.e. (please correct me if I'm wrong) n-way write approach is not available yet. You might confuse it with the fact that by default, sync() is called on every edit. And you can change it by using deferred log flushing. Either way, sync() is going to be a pipelined write. There's an explanation of benefits of pipelined and n-way writes there in the book (p337), it's not just about which approach provides better durability of saved edits. Both of them do. But both can take different time to execute and utilize network differently: pipelined *may* be slower but can saturate network bandwidth better. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Tue, Jul 31, 2012 at 9:09 PM, Mohit Anchlia mohitanch...@gmail.comjavascript:; wrote: In the HBase book it mentioned that the default behaviour of write is to call sync on each node before sending replica copies to the nodes in the pipeline. Is there a reason this was kept default because if data is getting written on multiple nodes then likelyhood of losing data is really low since another copy is always there on the replica nodes. Is it ok to make this sync async and is it advisable? -- Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
Re: Query a version of a column efficiently
Thanks Suraj. I looked at the code but it looks like the logic is not self-contained, particularly for the way hbase works with search for a specific version using TimeRange. Best Regards, Jerry On Mon, Jul 30, 2012 at 12:53 PM, Suraj Varma svarma...@gmail.com wrote: You may need to setup your Eclipse workspace and search using references etc.To get started, this is one class that uses TimeRange based matching ... org.apache.hadoop.hbase.regionserver.ScanQueryMatcher Also - Get is internally implemented as a Scan over a single row. Hope this gets you started. --Suraj On Thu, Jul 26, 2012 at 4:34 PM, Jerry Lam chiling...@gmail.com wrote: Hi St.Ack: Can you tell me which source code is responsible for the logic. The source code in the get and scan doesnt provide an indication of how the setTimeRange works. Best Regards, Jerry Sent from my iPad (sorry for spelling mistakes) On 2012-07-26, at 18:30, Stack st...@duboce.net wrote: On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam chiling...@gmail.com wrote: Hi St.Ack: Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6, 10]. I want to execute an efficient query that returns one version of the column that has a timestamp that is equal to 5 or less. So in this case, it should return the value of the column A with timestamp = 3. Using the setTimeRange(5, Long.MAX_VALUE) with setMaxVersion = 1, my guess is that it will return the version 6 not version 3. Correct me if I'm wrong. What Tom says, try it. IIUC, it'll give you your 3. It won't give you 6 since that is outside of the timerange (try 0 instead of MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would have to check code). St.Ack
Re: Filter with State
Hi St.Ack: Schema cannot be changed to a single row. The API describes Do not rely on filters carrying state across rows; its not reliable in current hbase as we have no handlers in place for when regions split, close or server crashes. If we manage region splitting ourselves, so the split issue doesn't apply. Other failures can be handled on the application level. Does each invocation of scanner.next instantiate a new filter at the server side even on the same region (I.e. Does scanning on the same region use the same filter or different filter depending on the scanner.next calls??) Best Regards, Jerry Sent from my iPad (sorry for spelling mistakes) On 2012-08-01, at 18:44, Stack st...@duboce.net wrote: On Wed, Aug 1, 2012 at 10:44 PM, Jerry Lam chiling...@gmail.com wrote: Hi HBase guru: From Lars George talk, he mentions that filter has no state. What if I need to scan rows in which the decision to filter one row or not is based on the previous row's column values? Any idea how one can implement this type of logic? You could try carrying state in the client (but if client dies state dies). You can't have scanners carry state across rows. It says so in API http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description (Whatever about the API, if LarsG says it, it must be so!). Here is the issue: If row X is in region A on server 1 there is nothing to prevent row X+1 from being on region B on server 2. How do you carry the state between such rows reliably? Can you redo your schema such that the state you need to carry remains within a row? St.Ack
Re: Filter with State
Hi Lars, I understand that it is more difficult to carry states across regions/servers, how about in a single region? Knowing that the rows in a single region have dependencies, can we have filter with state? If filter doesn't provide this ability, is there other mechanism in hbase to offer this kind of functionalities? I think this is a good feature because it allows efficient scanning on dependent rows. Instead of fetching each row to the client side and check if we should fetch the next row, the filter on the server side handles this logic. Best Regards, Jerry Sent from my iPad (sorry for spelling mistakes) On 2012-08-01, at 21:52, lars hofhansl lhofha...@yahoo.com wrote: The issue here is that different rows can be located in different regions or even different region servers, so no local state will carry over all rows. - Original Message - From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Cc: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, August 1, 2012 5:48 PM Subject: Re: Filter with State Hi St.Ack: Schema cannot be changed to a single row. The API describes Do not rely on filters carrying state across rows; its not reliable in current hbase as we have no handlers in place for when regions split, close or server crashes. If we manage region splitting ourselves, so the split issue doesn't apply. Other failures can be handled on the application level. Does each invocation of scanner.next instantiate a new filter at the server side even on the same region (I.e. Does scanning on the same region use the same filter or different filter depending on the scanner.next calls??) Best Regards, Jerry Sent from my iPad (sorry for spelling mistakes) On 2012-08-01, at 18:44, Stack st...@duboce.net wrote: On Wed, Aug 1, 2012 at 10:44 PM, Jerry Lam chiling...@gmail.com wrote: Hi HBase guru: From Lars George talk, he mentions that filter has no state. What if I need to scan rows in which the decision to filter one row or not is based on the previous row's column values? Any idea how one can implement this type of logic? You could try carrying state in the client (but if client dies state dies). You can't have scanners carry state across rows. It says so in API http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description (Whatever about the API, if LarsG says it, it must be so!). Here is the issue: If row X is in region A on server 1 there is nothing to prevent row X+1 from being on region B on server 2. How do you carry the state between such rows reliably? Can you redo your schema such that the state you need to carry remains within a row? St.Ack
Re: How to query by rowKey-infix
Hi Chris: I'm thinking about building a secondary index for primary key lookup, then query using the primary keys in parallel. I'm interested to see if there is other option too. Best Regards, Jerry On Tue, Jul 31, 2012 at 11:27 AM, Christian Schäfer syrious3...@yahoo.dewrote: Hello there, I designed a row key for queries that need best performance (~100 ms) which looks like this: userId-date-sessionId These queries(scans) are always based on a userId and sometimes additionally on a date, too. That's no problem with the key above. However, another kind of queries shall be based on a given time range whereas the outermost left userId is not given or known. In this case I need to get all rows covering the given time range with their date to create a daily reporting. As I can't set wildcards at the beginning of a left-based index for the scan, I only see the possibility to scan the index of the whole table to collect the rowKeys that are inside the timerange I'm interested in. Is there a more elegant way to collect rows within time range X? (Unfortunately, the date attribute is not equal to the timestamp that is stored by hbase automatically.) Could/should one maybe leverage some kind of row key caching to accelerate the collection process? Is that covered by the block cache? Thanks in advance for any advice. regards Chris
Query a version of a column efficiently
Hi HBase guru: I need some advises on a problem that I'm facing using HBase. How can I efficiently query a version of a column when I don't know exactly the version I'm looking for? For instance, I want to query a column with timestamp that is less or equal to N, if version = N is available, return it to me. Otherwise, I want the version that is closest to the version N (order by descending of timestamp). That is if version = N - 1 exists, I want it to be returned. I looked into the TimeRange query, it doesn't seem to provide this semantic naturally. Note that I don't know which version is closest to N so the setTimeRange(0,N+1). Do I need to implement a filter to do that or is it already available? Any help will be appreciated. Best Regards, Jerry
Re: Query a version of a column efficiently
Hi St.Ack: Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6, 10]. I want to execute an efficient query that returns one version of the column that has a timestamp that is equal to 5 or less. So in this case, it should return the value of the column A with timestamp = 3. Using the setTimeRange(5, Long.MAX_VALUE) with setMaxVersion = 1, my guess is that it will return the version 6 not version 3. Correct me if I'm wrong. Best Regards, Jerry On Thu, Jul 26, 2012 at 5:13 PM, Stack st...@duboce.net wrote: On Thu, Jul 26, 2012 at 7:43 PM, Jerry Lam chiling...@gmail.com wrote: I need some advises on a problem that I'm facing using HBase. How can I efficiently query a version of a column when I don't know exactly the version I'm looking for? For instance, I want to query a column with timestamp that is less or equal to N, if version = N is available, return it to me. Otherwise, I want the version that is closest to the version N (order by descending of timestamp). That is if version = N - 1 exists, I want it to be returned. Have you tried a timerange w/ minStamp of N and maxStamp of HConstants#LATEST_TIMESTAMP Long.MAX_VALUE) returning one version only (setMaxVersion(1))? St.Ack
Re: Query a version of a column efficiently
Hi St.Ack: Can you tell me which source code is responsible for the logic. The source code in the get and scan doesnt provide an indication of how the setTimeRange works. Best Regards, Jerry Sent from my iPad (sorry for spelling mistakes) On 2012-07-26, at 18:30, Stack st...@duboce.net wrote: On Thu, Jul 26, 2012 at 11:40 PM, Jerry Lam chiling...@gmail.com wrote: Hi St.Ack: Let say there are 5 versions for a column A with timestamp = [0, 1, 3, 6, 10]. I want to execute an efficient query that returns one version of the column that has a timestamp that is equal to 5 or less. So in this case, it should return the value of the column A with timestamp = 3. Using the setTimeRange(5, Long.MAX_VALUE) with setMaxVersion = 1, my guess is that it will return the version 6 not version 3. Correct me if I'm wrong. What Tom says, try it. IIUC, it'll give you your 3. It won't give you 6 since that is outside of the timerange (try 0 instead of MAX_VALUE; I may have misled w/ MAX_VALUE... it might work but would have to check code). St.Ack
Re: Scanning columns
Hi, This sounds like you are looking for ColumnRangeFilter? Best Regards, Jerry On Wednesday, July 18, 2012, Mohit Anchlia wrote: I am designing a HBase schema as a timeseries model. Taking advice from the definitive guide and tsdb I am planning to use my row key as metricname:Long.MAX_VALUE - basetimestamp. And the column names would be timestamp-base timestamp. My col names would then look like 1,2,3,4,5 .. for instance. I am looking at Java API to see if I can do a range scan of columns, can I say fetch me columns starting at 1 and stop at 4? I see a scanner class for row scans but wondering if columns are sorted before storing and if I can do a range scan on them too.
Re: WAL corruption
my understanding is that the WAL log is used for replication as well. If all your data has been persisted to disk (i.e. all data in memstores have been flushed to disks) and replication is disabled, I believe you can delete the WAL without data loss. just my 2 cents On 2012-07-02, at 1:37 PM, Bryan Keller wrote: During an upgrade of my cluster to 0.90 to 0.92 over the weekend, the WAL (files in the /hbase/.logs directory) was corrupted and it prevented HBase from starting up. The exact exception was java.io.IOException: Could not obtain the last block locations on the WAL files. I was able to recover by deleting the /hbase/.logs directory. My question is, if HBase had no pending updates, i.e. nothing writing to it, is there any risk of data loss by deleting the WAL directory? For example, does rebalancing, flushing, or compaction use the WAL or is the WAL used only for inserts/updates/deletes?
Re: Recovering corrupt HLog files
This is interesting because I saw this happens in the past. Is walplayer can be back ported to 0.90.x? Best Regards, Jerry Sent from my iPad On 2012-06-30, at 16:34, Li Pi l...@idle.li wrote: Nope. It came out in 0.94 otoh. On Sat, Jun 30, 2012 at 12:29 PM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: I should have mentioned in my initial email that I am operating on HBase 0.90.4. Is WALPlayer available in this version? I am having trouble finding it or anything similar. On Sat, Jun 30, 2012 at 1:14 PM, Li Pi l...@idle.li wrote: WALPlayer will look at the timestamp. Replaying an older edit that has since been overwritten shouldn't change anything. On Sat, Jun 30, 2012 at 9:49 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: They are all pretty large, around 40+mb. Will the walplayer be smart enough to only write edits that still look relevant (i.e. based on timestamps of the edits vs timestamps of the versions in hbase)? Writes have been coming in since we recovered. On Sat, Jun 30, 2012 at 11:05 AM, Stack st...@duboce.net wrote: On Sat, Jun 30, 2012 at 8:38 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: 12/06/30 00:00:48 INFO wal.HLogSplitter: Got while parsing hlog hdfs://my-namenode-ip-addr:8020/hbase/.logs/my-rs-ip-addr,60020,1338667719591/my-rs-ip-addr%3A60020.1340935453874. Marking as corrupted What size do these logs have? We are back to stable operating now, and in trying to research this I found the hdfs://my-namenode-ip-addr:8020/hbase/.corrupt directory. There are 20 files listed there. Ditto. What are our options for tracking down and potentially recovering any data that was lost. Or how can we even tell what was lost, if any? Does the existence of these files pretty much guarantee data lost? There doesn't seem to be much documentation on this. From reading it seems like it might be possible that part of each of these files was recovered. If size 0, could try walplaying them: http://hbase.apache.org/book.html#walplayer St.Ack
Re: direct Hfile Read and Writes
Hi Samar: I have used IncrementalLoadHFile successfully in the past. Basically, once you have written hfile youreself you can use the IncrementalLoadHFile to merge with the HFile currently managed by HBase. Once it is loaded to HBase, the records in the increment hfile are accessible by clients. HTH, Jerry On Wed, Jun 27, 2012 at 10:33 AM, shixing paradise...@gmail.com wrote: 1. Since the data we might need would be distributed across regions how would direct reading of Hfile be helpful. You can read the HFilePrettyPrinter, it shows how to create a HFile.Reader and use it to read the HFile. Or you can use the ./hbase org.apache.hadoop.hbase.io.hfile.HFile -p -f hdfs:///xxx/hfile to print some info to have a look. 2. Any use-case for direct writes of Hfiles. If we write Hfiles will that data be accessible to the hbase shell. You can read the HFileOutputFormat, it shows how to create a HFile.Writer and use it to directly write kvs the HFile. If you want to read the data by hbase shell, you should firstly load the HFile to regionservers, details for bulkload http://hbase.apache.org/book.html#arch.bulk.load . On Wed, Jun 27, 2012 at 6:49 PM, samar kumar samar.opensou...@gmail.com wrote: Hi Hbase Users, I have seen API's supporting HFile direct reads and write. I Do understand it would create Hfiles in the location specified and it should be much faster since we would skip all the look ups to ZK. catalog table . RS , but can anyone point me to a particular case when we would like to read/write directly . 1. Since the data we might need would be distributed across regions how would direct reading of Hfile be helpful. 2. Any use-case for direct writes of Hfiles. If we write Hfiles will that data be accessible to the hbase shell. Regards, Samar -- Best wishes! My Friend~
Re: HFile Performance
Hi Elliott: Great! I will look into it ~ Best Regards, Jerry On Thu, Jun 21, 2012 at 6:24 PM, Elliott Clark ecl...@stumbleupon.comwrote: HFilePerformanceEvaluation is in the source tree hbase-server/src/test. I haven't played with it myself but it might help you. On Thu, Jun 21, 2012 at 3:13 PM, Jerry Lam chiling...@gmail.com wrote: Hi HBase guru, I would like to benchmark HFile performance without other components in HBase. I know that I can use HFile as any other file format in Hadoop IO. I wonder if there is a HFile benchmark available so I don't end up reinventing the wheel. Best Regards, Jerry
Re: Hbase Replication
Hi Mohammad: The current HBase replication (as far as I understand) is designed to replicate data from one data center to another data center. The client application has no knowledge of the zookeeper ensemble in the slave cluster and therefore, it cannot switch to the slave cluster in case of a system failure. You might want to implement a smart client can switch to the slave cluster when it detects a failure. Be aware of the data might not immediately available to the slave cluster (i.e. the master cluster might have the most up-to-date data than the slave cluster even you configured the replication) HTH, Jerry On Fri, Jun 22, 2012 at 5:18 PM, Mohammad Tariq donta...@gmail.com wrote: Hello list, I was going through the Hbase Replication documentation(at http://hbase.apache.org/replication.html) to get myself clear with the concepts..One thing which I could not find is that whether it is possible to configure the replication in such a way that if my master cluster goes down the slave cluster will automatically take its place..Need some advice/comments from the experts.Many thanks. Regards, Mohammad Tariq
HFile Performance
Hi HBase guru, I would like to benchmark HFile performance without other components in HBase. I know that I can use HFile as any other file format in Hadoop IO. I wonder if there is a HFile benchmark available so I don't end up reinventing the wheel. Best Regards, Jerry
Re: Isolation level
Hi Cristina: My understanding of HBase is that the isolation level fro read ops is Read Committed. There is only write lock which could protect the data from modifying by other requests but there is no read-lock (it is there but it doesn't have any effect). Since put ops are atomic, it can succeed or fail but not in the middle so clients can only read the data if the write ops succeeds HTH, Jerry On Fri, Jun 15, 2012 at 7:59 AM, Cristina cristi_...@hotmail.com wrote: Hi, I have read that Hbase has read committed as isolation level, but I have some doubts. Is it possible to chage this level, for instance to read uncommitted? How could I do this? Another question, Is this isolation level based on locks? I have doubts because Hbase has multiversion concurrency control so it may implement read committed snapshot or snapshot isolation. Thanks, Cristina
Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write
Hi Himanshu: Thanks for following up! I did looked up the log and there were some exceptions. I'm not sure if those exceptions contribute to the problem I've seen a week ago. I did aware of the latency between the time that the master said Nothing to replicate and the actual time it takes to actually replicate on the slave. I remember I wait 12 hours for the replication to finish (i.e. start the test before leaving office and check the result the next day) and data still not fully replicated. By the way, is your test running with master-slave replication or master-master replication? I will resume this again. I was busy on something else for the past week or so. Best Regards, Jerry On 2012-05-01, at 6:41 PM, Himanshu Vashishtha wrote: Hello Jerry, Did you try this again. Whenever you try next, can you please share the logs somehow. I tried replicating your scenario today, but no luck. I used the same workload you have copied here; master cluster has 5 nodes and slave has just 2 nodes; and made tiny regions of 8MB (memstore flushing at 8mb too), so that I have around 1200+ regions even for 200k rows; ran the workload with 16, 24 and 32 client threads, but the verifyrep mapreduce job says its good. Yes, I ran the verifyrep command after seeing there is nothing to replicate message on all the regionservers; sometimes it was a bit slow. Thanks, Himanshu On Mon, Apr 23, 2012 at 11:57 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: I will try your suggestion today with a master-slave replication enabled from Cluster A - Cluster B. Please do. Last Friday, I tried to limit the variability/the moving part of the replication components. I reduced the size of Cluster B to have only 1 regionserver and having Cluster A to replicate data from one region only without region splitting (therefore I have 1-to-1 region replication setup). During the benchmark, I moved the region between different regionservers in Cluster A (note there are still 3 regionservers in Cluster A). I ran this test for 5 times and no data were lost. Does it mean something? My feeling is there are some glitches/corner cases that have not been covered in the cyclic replication (or hbase replication in general). Note that, this happens only when the load is high. And have you looked at the logs? Any obvious exceptions coming up? Replication uses the normal HBase client to insert the data on the other cluster and this is what handles regions moving around. By the way, why do we need to have a zookeeper not handled by hbase for the replication to work (it is described in the hbase documentation)? It says you *should* do it, not you *need* to do it :) But basically replication is zk-heavy and getting a better understanding of it starts with handling it yourself. J-D
Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write
Hi Himanshu: My team is particularly interested in the cyclic replication so I have enable the master-master replication (so each cluster has the other cluster as its replication peer), although the replication was one direction (from cluster A to cluster B) in the test. I didn't stop_replication on the other cluster if that is what you mean by disabling the replication. Thanks! Jerry On 2012-05-01, at 10:08 PM, Himanshu Vashishtha wrote: Yeah, I should have mentioned that: its master-master, and on cdh4b1. But, replication on that specific slave table is disabled (so, effectively its master-slave for this test). Is this same as yours (replication config wise), or shall I enable replication on the destination table too? Thanks, Himanshu On Tue, May 1, 2012 at 8:01 PM, Jerry Lam chiling...@gmail.com wrote: Hi Himanshu: Thanks for following up! I did looked up the log and there were some exceptions. I'm not sure if those exceptions contribute to the problem I've seen a week ago. I did aware of the latency between the time that the master said Nothing to replicate and the actual time it takes to actually replicate on the slave. I remember I wait 12 hours for the replication to finish (i.e. start the test before leaving office and check the result the next day) and data still not fully replicated. By the way, is your test running with master-slave replication or master-master replication? I will resume this again. I was busy on something else for the past week or so. Best Regards, Jerry On 2012-05-01, at 6:41 PM, Himanshu Vashishtha wrote: Hello Jerry, Did you try this again. Whenever you try next, can you please share the logs somehow. I tried replicating your scenario today, but no luck. I used the same workload you have copied here; master cluster has 5 nodes and slave has just 2 nodes; and made tiny regions of 8MB (memstore flushing at 8mb too), so that I have around 1200+ regions even for 200k rows; ran the workload with 16, 24 and 32 client threads, but the verifyrep mapreduce job says its good. Yes, I ran the verifyrep command after seeing there is nothing to replicate message on all the regionservers; sometimes it was a bit slow. Thanks, Himanshu On Mon, Apr 23, 2012 at 11:57 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: I will try your suggestion today with a master-slave replication enabled from Cluster A - Cluster B. Please do. Last Friday, I tried to limit the variability/the moving part of the replication components. I reduced the size of Cluster B to have only 1 regionserver and having Cluster A to replicate data from one region only without region splitting (therefore I have 1-to-1 region replication setup). During the benchmark, I moved the region between different regionservers in Cluster A (note there are still 3 regionservers in Cluster A). I ran this test for 5 times and no data were lost. Does it mean something? My feeling is there are some glitches/corner cases that have not been covered in the cyclic replication (or hbase replication in general). Note that, this happens only when the load is high. And have you looked at the logs? Any obvious exceptions coming up? Replication uses the normal HBase client to insert the data on the other cluster and this is what handles regions moving around. By the way, why do we need to have a zookeeper not handled by hbase for the replication to work (it is described in the hbase documentation)? It says you *should* do it, not you *need* to do it :) But basically replication is zk-heavy and getting a better understanding of it starts with handling it yourself. J-D
Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write
Hi Lars: I will try your suggestion today with a master-slave replication enabled from Cluster A - Cluster B. Last Friday, I tried to limit the variability/the moving part of the replication components. I reduced the size of Cluster B to have only 1 regionserver and having Cluster A to replicate data from one region only without region splitting (therefore I have 1-to-1 region replication setup). During the benchmark, I moved the region between different regionservers in Cluster A (note there are still 3 regionservers in Cluster A). I ran this test for 5 times and no data were lost. Does it mean something? My feeling is there are some glitches/corner cases that have not been covered in the cyclic replication (or hbase replication in general). Note that, this happens only when the load is high. By the way, why do we need to have a zookeeper not handled by hbase for the replication to work (it is described in the hbase documentation)? Best Regards, Jerry On 2012-04-20, at 7:08 PM, lars hofhansl wrote: I see. Does this only happen when cyclic replication is enabled in this way (i.e. master - master replication). The replication back does take some overhead as the replicator needs to filter edits from being replication back to the originator, but I would not have thought that would cause any issues. Could you run the same test once with replication only enabled from ClusterA - ClusterB? Thanks. -- Lars - Original Message - From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Friday, April 20, 2012 3:43 PM Subject: Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write Hi Himanshu: I'm using hbase 0.92.1 and hadoop 1.0.1 migrating from hbase 0.90.4 and Hadoop 0.20 with append feature. It is a one side replication (cluster A to cluster B) with cyclic replication enabled (i.e. add_peer of the other cluster configured). Best Regards, Jerry Sent from my iPad On 2012-04-20, at 10:23, Himanshu Vashishtha hvash...@cs.ualberta.ca wrote: Hello Jerry, Which HBase version? You are not using cyclic replication? Its simple one side replication, right? Thanks, Himanshu On Fri, Apr 20, 2012 at 5:38 AM, Jerry Lam chiling...@gmail.com wrote: Hi HBase community: We have been testing cyclic replication for 1 week. The basic functionality seems to work as described in the document however when we started to increase the write workload, the replication starts to miss data (i.e. some data are not replicated to the other cluster). We have narrowed down to a scenario that we can reproduce the problem quite consistently and here it is: - Setup: - We have setup 2 clusters (cluster A and cluster B)with identical size in terms of number of nodes and configuration, 3 regionservers sit on top of 3 datanodes. - Cyclic replication is enabled. - We use YCSB to generate load to hbase the workload is very similar to workloada: recordcount=20 operationcount=20 workload=com.yahoo.ycsb.workloads.CoreWorkload fieldcount=1 fieldlength=25000 readallfields=true writeallfields=true readproportion=0 updateproportion=1 scanproportion=0 insertproportion=0 requestdistribution=uniform - Records are inserted into Cluster A. After the benchmark is done and wait until all data are replicated to Cluster B, we used verifyrep mapreduce job for validation. - Data are deleted from both table (truncate 'tablename') before a new experiment is started. Scenario: when we increase the number of threads until it max out the throughput of the cluster, we saw some data are missing in Cluster B (total count != 20) although cluster A clearly has them all. This happens even though we disabled region splitting in both clusters (it happens more often when region splits occur). To further having more control of what is happening, we then decided to disable the load balancer so the region (which is responsible for the replicating data) will not relocate to other regionserver during the benchmark. The situation improves a lot. We don't see any missing data in 5 continuous runs. Finally, we decided to move the region around from a regionserver to another regionserver during the benchmark to see if the problem will reappear and it did. We believe that the issue could be related to region splitting and load balancing during intensive write, the hbase replication strategy hasn't yet cover those corner cases. Can someone take a look of it and suggest some ways to workaround this? Thanks~ Jerry
HBase Cyclic Replication Issue: some data are missing in the replication for intensive write
Hi HBase community: We have been testing cyclic replication for 1 week. The basic functionality seems to work as described in the document however when we started to increase the write workload, the replication starts to miss data (i.e. some data are not replicated to the other cluster). We have narrowed down to a scenario that we can reproduce the problem quite consistently and here it is: - Setup: - We have setup 2 clusters (cluster A and cluster B)with identical size in terms of number of nodes and configuration, 3 regionservers sit on top of 3 datanodes. - Cyclic replication is enabled. - We use YCSB to generate load to hbase the workload is very similar to workloada: recordcount=20 operationcount=20 workload=com.yahoo.ycsb.workloads.CoreWorkload fieldcount=1 fieldlength=25000 readallfields=true writeallfields=true readproportion=0 updateproportion=1 scanproportion=0 insertproportion=0 requestdistribution=uniform - Records are inserted into Cluster A. After the benchmark is done and wait until all data are replicated to Cluster B, we used verifyrep mapreduce job for validation. - Data are deleted from both table (truncate 'tablename') before a new experiment is started. Scenario: when we increase the number of threads until it max out the throughput of the cluster, we saw some data are missing in Cluster B (total count != 20) although cluster A clearly has them all. This happens even though we disabled region splitting in both clusters (it happens more often when region splits occur). To further having more control of what is happening, we then decided to disable the load balancer so the region (which is responsible for the replicating data) will not relocate to other regionserver during the benchmark. The situation improves a lot. We don't see any missing data in 5 continuous runs. Finally, we decided to move the region around from a regionserver to another regionserver during the benchmark to see if the problem will reappear and it did. We believe that the issue could be related to region splitting and load balancing during intensive write, the hbase replication strategy hasn't yet cover those corner cases. Can someone take a look of it and suggest some ways to workaround this? Thanks~ Jerry
Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write
Hi Himanshu: I'm using hbase 0.92.1 and hadoop 1.0.1 migrating from hbase 0.90.4 and Hadoop 0.20 with append feature. It is a one side replication (cluster A to cluster B) with cyclic replication enabled (i.e. add_peer of the other cluster configured). Best Regards, Jerry Sent from my iPad On 2012-04-20, at 10:23, Himanshu Vashishtha hvash...@cs.ualberta.ca wrote: Hello Jerry, Which HBase version? You are not using cyclic replication? Its simple one side replication, right? Thanks, Himanshu On Fri, Apr 20, 2012 at 5:38 AM, Jerry Lam chiling...@gmail.com wrote: Hi HBase community: We have been testing cyclic replication for 1 week. The basic functionality seems to work as described in the document however when we started to increase the write workload, the replication starts to miss data (i.e. some data are not replicated to the other cluster). We have narrowed down to a scenario that we can reproduce the problem quite consistently and here it is: - Setup: - We have setup 2 clusters (cluster A and cluster B)with identical size in terms of number of nodes and configuration, 3 regionservers sit on top of 3 datanodes. - Cyclic replication is enabled. - We use YCSB to generate load to hbase the workload is very similar to workloada: recordcount=20 operationcount=20 workload=com.yahoo.ycsb.workloads.CoreWorkload fieldcount=1 fieldlength=25000 readallfields=true writeallfields=true readproportion=0 updateproportion=1 scanproportion=0 insertproportion=0 requestdistribution=uniform - Records are inserted into Cluster A. After the benchmark is done and wait until all data are replicated to Cluster B, we used verifyrep mapreduce job for validation. - Data are deleted from both table (truncate 'tablename') before a new experiment is started. Scenario: when we increase the number of threads until it max out the throughput of the cluster, we saw some data are missing in Cluster B (total count != 20) although cluster A clearly has them all. This happens even though we disabled region splitting in both clusters (it happens more often when region splits occur). To further having more control of what is happening, we then decided to disable the load balancer so the region (which is responsible for the replicating data) will not relocate to other regionserver during the benchmark. The situation improves a lot. We don't see any missing data in 5 continuous runs. Finally, we decided to move the region around from a regionserver to another regionserver during the benchmark to see if the problem will reappear and it did. We believe that the issue could be related to region splitting and load balancing during intensive write, the hbase replication strategy hasn't yet cover those corner cases. Can someone take a look of it and suggest some ways to workaround this? Thanks~ Jerry
Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write
Hi Lars: I'm using hbase 0.92.1 and Hadoop 1.0.1. Yes, you are right. I'm replicating from cluster A to cluster B only with cyclic replication configured. Eventually I will test replicating cluster A to cluster B and vice versa with high intensive write workload but if this replication doesn't work for one way, we need to think about other solutions. No data loss in cluster A for sure. Best Regards, Jerry Sent from my iPad On 2012-04-20, at 15:34, lars hofhansl lhofha...@yahoo.com wrote: Hi Jerry, which version of HBase are you using? You are not using cyclic backup, that needs 2 clusters. I assume you're just replicating from one cluster to another, right? There is never data loss in Cluster A? -- Lars - Original Message - From: Jerry Lam chiling...@gmail.com To: user@hbase.apache.org Cc: Sent: Friday, April 20, 2012 5:38 AM Subject: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write Hi HBase community: We have been testing cyclic replication for 1 week. The basic functionality seems to work as described in the document however when we started to increase the write workload, the replication starts to miss data (i.e. some data are not replicated to the other cluster). We have narrowed down to a scenario that we can reproduce the problem quite consistently and here it is: - Setup: - We have setup 2 clusters (cluster A and cluster B)with identical size in terms of number of nodes and configuration, 3 regionservers sit on top of 3 datanodes. - Cyclic replication is enabled. - We use YCSB to generate load to hbase the workload is very similar to workloada: recordcount=20 operationcount=20 workload=com.yahoo.ycsb.workloads.CoreWorkload fieldcount=1 fieldlength=25000 readallfields=true writeallfields=true readproportion=0 updateproportion=1 scanproportion=0 insertproportion=0 requestdistribution=uniform - Records are inserted into Cluster A. After the benchmark is done and wait until all data are replicated to Cluster B, we used verifyrep mapreduce job for validation. - Data are deleted from both table (truncate 'tablename') before a new experiment is started. Scenario: when we increase the number of threads until it max out the throughput of the cluster, we saw some data are missing in Cluster B (total count != 20) although cluster A clearly has them all. This happens even though we disabled region splitting in both clusters (it happens more often when region splits occur). To further having more control of what is happening, we then decided to disable the load balancer so the region (which is responsible for the replicating data) will not relocate to other regionserver during the benchmark. The situation improves a lot. We don't see any missing data in 5 continuous runs. Finally, we decided to move the region around from a regionserver to another regionserver during the benchmark to see if the problem will reappear and it did. We believe that the issue could be related to region splitting and load balancing during intensive write, the hbase replication strategy hasn't yet cover those corner cases. Can someone take a look of it and suggest some ways to workaround this? Thanks~ Jerry