Re: 答复: flushing + compactions after config change
bq. On Thu, Jun 27, 2013 at 4:27 PM, Viral Bajaria viral.baja...@gmail.com wrote: It's not random, it picks the region with the most data in its memstores. That's weird, because I see some of my regions which receive the least amount of data in a given time period flushing before the regions that are receiving data continuously. I agree with Viral here. When max logs are reached, we look at the oldest wal and see which regions should be flushed in order to get that first wal (read oldest) archived. In your case Viral, these regions could be those which are not receiving many edits when 32 logs have been rolled. It may be very specific to your use case, but you could try playing with max number of logs? May be make them 16, 40, etc? On Fri, Jun 28, 2013 at 4:53 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Fri, Jun 28, 2013 at 2:39 PM, Viral Bajaria viral.baja...@gmail.com wrote: On Fri, Jun 28, 2013 at 9:31 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Thu, Jun 27, 2013 at 4:27 PM, Viral Bajaria viral.baja...@gmail.com wrote: It's not random, it picks the region with the most data in its memstores. That's weird, because I see some of my regions which receive the least amount of data in a given time period flushing before the regions that are receiving data continuously. The reason I know this is because of the write pattern. Some of my tables are in catch-up mode i.e. I am ingesting data from the past and they always have something to do. While some tables are not in catch-up mode and are just sitting idle for most of the time. Yet I see high number of flushes for those regions too. I doubt that it's the fact that it's a major compaction that it's making everything worse. When a minor gets promoted into a major it's because we're already going to compact all the files, so we might as well get rid of some deletes at the same time. They are all getting selected because the files are within the selection ratio. I would not focus on this to resolve your problem. I meant worse for my writes not for HBase as a whole. I haven't been closely following this thread, but have you posted a log snippet somewhere? It's usually much more telling and we eliminate a few levels of interpretation. Make sure it's at DEBUG, and that you grab a few hours of activity. Get the GC log for the same time as well. Drop this on a web server or pastebin if it fits. The only log snippet that I posted was the flushing action. Also that log was not everything, I had grep'd a few lines out. Let me collect some more stats here and post it again. I just enabled GC logging on this server, deployed the wrong config out initially which had no GC logging. I am not sure how GC logs will help here given that I am at less than 50% heap space used and so I would doubt a stop the world GC happening. Are you trying to look for some other information ? Just trying to cover all the bases. J-D
question about hbase envionmnet variable
if i set HBASE_HEAPSIZE=2 (HEAP is 20G ) ,can i set jvm option -Xmx20g -Xms20G? if not ,how much i can set?
lzo lib missing ,region server can not start
i add lzo compression in config file ,but region server can not start,it seems lzo lib is miss,how can i install lzo lib for hbase,and in production which compress is used ? snappy or lzo? thanks all # /etc/init.d/hadoop-hbase-regionserver start starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-CH34.out Exception in thread main java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2805) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:60) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2829) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) [root@CH34 ~]# less /var/log/hbase/hbase-hbase-regionserver-CH34.out Exception in thread main java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2805) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:60) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2829) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2803) ... 5 more Caused by: java.io.IOException: Compression codec lzo not supported, aborting RS construction at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:295) ... 10 more # hbase org.apache.hadoop.hbase.util.CompressionTest file:///root/jdk-6u35-linux-amd64.rpm lzo 13/07/01 15:45:05 INFO util.NativeCodeLoader: Loaded the native-hadoop library Exception in thread main java.lang.RuntimeException: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:110) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:234) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:515) at org.apache.hadoop.hbase.util.CompressionTest.doSmokeTest(CompressionTest.java:108) at org.apache.hadoop.hbase.util.CompressionTest.main(CompressionTest.java:134) Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:105) ... 8 more
Re: How many column families in one table ?
Thanks Dhaval/Michael/Ted/Otis for your replies. Actually , i asked this question because i am seeing some performance degradation in my production Hbase setup. I have configured Hbase in pseudo distributed mode on top of HDFS. I have created 17 Column families :( . I am actually using 14 out of these 17 column families. Each column family has around on average 8-10 column qualifiers so total around 140 columns are there for each row key. I have around 1.6 millions rows in the table. To completely scan the table for all 140 columns , it takes around 30-40 minutes. Is it normal or Should i redesign my table schema ( probably merging 4-5 column families into one , so that at the end i have just 3-4 cf ) ? On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hm, works for me - http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42 Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain vkj...@gmail.com wrote: Hi All , Thanks for your replies. Ted, Thanks for the link, but its not working . :( On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu yuzhih...@gmail.com wrote: Vimal: Please also refer to: http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel michael_se...@hotmail.com wrote: Short answer... As few as possible. 14 CF doesn't make too much sense. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 28, 2013, at 12:20 AM, Vimal Jain vkj...@gmail.com wrote: Hi, How many column families should be there in an hbase table ? Is there any performance issue in read/write if we have more column families ? I have designed one table with around 14 column families in it with each having on average 6 qualifiers. Is it a good design ? -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain
Re: How many column families in one table ?
When you did the scan, did you check what the bottleneck was ? Was it I/O ? Did you see any GC locks ? How much RAM are you giving to your RS ? -Viral On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: To completely scan the table for all 140 columns , it takes around 30-40 minutes.
Re: How many column families in one table ?
I scanned it during normal traffic hours.There was no I/O load on the server. I dont see any GC locks too. Also i have given 1.5G to RS , 512M to each Master and Zookeeper. One correction in the post above : Actual time to scan whole table is even more , it takes 10 mins to scan 0.1 million rows ( so total of 2.5 hours to scan 1.6 million rows) . The time i mentioned in previous post was for different type of lookup.Please ignore that. On Mon, Jul 1, 2013 at 2:24 PM, Viral Bajaria viral.baja...@gmail.comwrote: When you did the scan, did you check what the bottleneck was ? Was it I/O ? Did you see any GC locks ? How much RAM are you giving to your RS ? -Viral On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: To completely scan the table for all 140 columns , it takes around 30-40 minutes. -- Thanks and Regards, Vimal Jain
Re: lzo lib missing ,region server can not start
Please take a look at http://hbase.apache.org/book.html#lzo.compression and the links in that section. Cheers On Mon, Jul 1, 2013 at 3:57 PM, ch huang justlo...@gmail.com wrote: i add lzo compression in config file ,but region server can not start,it seems lzo lib is miss,how can i install lzo lib for hbase,and in production which compress is used ? snappy or lzo? thanks all # /etc/init.d/hadoop-hbase-regionserver start starting regionserver, logging to /var/log/hbase/hbase-hbase-regionserver-CH34.out Exception in thread main java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2805) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:60) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2829) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) [root@CH34 ~]# less /var/log/hbase/hbase-hbase-regionserver-CH34.out Exception in thread main java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2805) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:60) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2829) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2803) ... 5 more Caused by: java.io.IOException: Compression codec lzo not supported, aborting RS construction at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:295) ... 10 more # hbase org.apache.hadoop.hbase.util.CompressionTest file:///root/jdk-6u35-linux-amd64.rpm lzo 13/07/01 15:45:05 INFO util.NativeCodeLoader: Loaded the native-hadoop library Exception in thread main java.lang.RuntimeException: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:110) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:234) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536) at org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:515) at org.apache.hadoop.hbase.util.CompressionTest.doSmokeTest(CompressionTest.java:108) at org.apache.hadoop.hbase.util.CompressionTest.main(CompressionTest.java:134) Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:105) ... 8 more
HBASE-7846 : is it safe to use on 0.94.4 ?
Hi, Just wanted to check if it's safe to use the JIRA mentioned in the subject i.e. https://issues.apache.org/jira/browse/HBASE-7846 Thanks, Viral
Re: Issues with delete markers
That would be quite dramatic change, we cannot pass delete markers to the existing filters without confusing them. We could invent a new method (filterDeleteKV or filterDeleteMarker or something) on filters along with a new filter type that implements that method. -- Lars - Original Message - From: Varun Sharma va...@pinterest.com To: d...@hbase.apache.org d...@hbase.apache.org; user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 1:56 PM Subject: Re: Issues with delete markers Sorry, typo, i meant that for user scans, should we be passing delete markers through.the filters as well ? Varun On Sun, Jun 30, 2013 at 1:03 PM, Varun Sharma va...@pinterest.com wrote: For user scans, i feel we should be passing delete markers through as well. On Sun, Jun 30, 2013 at 12:35 PM, Varun Sharma va...@pinterest.comwrote: I tried this a little bit and it seems that filters are not called on delete markers. For raw scans returning delete markers, does it make sense to do that ? Varun On Sun, Jun 30, 2013 at 12:03 PM, Varun Sharma va...@pinterest.comwrote: Hi, We are having an issue with the way HBase does handling of deletes. We are looking to retrieve 300 columns in a row but the row has tens of thousands of delete markers in it before we span the 300 columns something like this row DeleteCol1 Col1 DeleteCol2 Col2 ... DeleteCol3 Col3 And so on. Therefore, the issue here, being that to retrieve these 300 columns, we need to go through tens of thousands of deletes - sometimes we get a spurt of these queries and that DDoSes a region server. We are okay with saying, only return first 300 columns and stop once you encounter, say 5K column delete markers or something. I wonder if such a construct is provided by HBase or do we need to build something on top of the RAW scan and handle the delete masking there. Thanks Varun
Re: Poor HBase map-reduce scan performance
Absolutely. - Original Message - From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail of HBASE-8369, there were some comments which are yet to be addressed. I think trunk patch should be finalized before backporting. Cheers On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller brya...@gmail.com wrote: I'll attach my patch to HBASE-8369 tomorrow. On Jun 28, 2013, at 10:56 AM, lars hofhansl la...@apache.org wrote: If we can make a clean patch with minimal impact to existing code I would be supportive of a backport to 0.94. -- Lars - Original Message - From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Cc: Sent: Tuesday, June 25, 2013 1:56 AM Subject: Re: Poor HBase map-reduce scan performance I tweaked Enis's snapshot input format and backported it to 0.94.6 and have snapshot scanning functional on my system. Performance is dramatically better, as expected i suppose. I'm seeing about 3.6x faster performance vs TableInputFormat. Also, HBase doesn't get bogged down during a scan as the regionserver is being bypassed. I'm very excited by this. There are some issues with file permissions and library dependencies but nothing that can't be worked out. On Jun 5, 2013, at 6:03 PM, lars hofhansl la...@apache.org wrote: That's exactly the kind of pre-fetching I was investigating a bit ago (made a patch, but ran out of time). This pre-fetching is strictly client only, where the client keeps the server busy while it is processing the previous batch, but filling up a 2nd buffer. -- Lars From: Sandy Pratt prat...@adobe.com To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, June 5, 2013 10:58 AM Subject: Re: Poor HBase map-reduce scan performance Yong, As a thought experiment, imagine how it impacts the throughput of TCP to keep the window size at 1. That means there's only one packet in flight at a time, and total throughput is a fraction of what it could be. That's effectively what happens with RPC. The server sends a batch, then does nothing while it waits for the client to ask for more. During that time, the pipe between them is empty. Increasing the batch size can help a bit, in essence creating a really huge packet, but the problem remains. There will always be stalls in the pipe. What you want is for the window size to be large enough that the pipe is saturated. A streaming API accomplishes that by stuffing data down the network pipe as quickly as possible. Sandy On 6/5/13 7:55 AM, yonghu yongyong...@gmail.com wrote: Can anyone explain why client + rpc + server will decrease the performance of scanning? I mean the Regionserver and Tasktracker are the same node when you use MapReduce to scan the HBase table. So, in my understanding, there will be no rpc cost. Thanks! Yong On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt prat...@adobe.com wrote: https://issues.apache.org/jira/browse/HBASE-8691 On 6/4/13 6:11 PM, Sandy Pratt prat...@adobe.com wrote: Haven't had a chance to write a JIRA yet, but I thought I'd pop in here with an update in the meantime. I tried a number of different approaches to eliminate latency and bubbles in the scan pipeline, and eventually arrived at adding a streaming scan API to the region server, along with refactoring the scan interface into an event-drive message receiver interface. In so doing, I was able to take scan speed on my cluster from 59,537 records/sec with the classic scanner to 222,703 records per second with my new scan API. Needless to say, I'm pleased ;) More details forthcoming when I get a chance. Thanks, Sandy On 5/23/13 3:47 PM, Ted Yu yuzhih...@gmail.com wrote: Thanks for the update, Sandy. If you can open a JIRA and attach your producer / consumer scanner there, that would be great. On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt prat...@adobe.com wrote: I wrote myself a Scanner wrapper that uses a producer/consumer queue to keep the client fed with a full buffer as much as possible. When scanning my table with scanner caching at 100 records, I see about a 24% uplift in performance (~35k records/sec with the ClientScanner and ~44k records/sec with my P/C scanner). However, when I set scanner caching to 5000, it's more of a wash compared to the standard ClientScanner: ~53k records/sec with the ClientScanner and ~60k records/sec with the P/C scanner. I'm not sure what to make of those results. I think next I'll shut down HBase and read the HFiles directly, to see if there's a drop off in performance between reading them directly vs. via the RegionServer. I still think that to really solve
Re: How many column families in one table ?
Can someone please reply ? Also what is the typical read/write speed of hbase and how much deviation would be there in my scenario mentioned above (14 cf , total 140 columns ) ? I am asking this because i am not simply printing out the scanned values , instead i am applying some logic on the data retrieved per row basis. So was just curious to find if that small logic in my code is contributing towards the long time taken to scan the table. On Mon, Jul 1, 2013 at 2:41 PM, Vimal Jain vkj...@gmail.com wrote: I scanned it during normal traffic hours.There was no I/O load on the server. I dont see any GC locks too. Also i have given 1.5G to RS , 512M to each Master and Zookeeper. One correction in the post above : Actual time to scan whole table is even more , it takes 10 mins to scan 0.1 million rows ( so total of 2.5 hours to scan 1.6 million rows) . The time i mentioned in previous post was for different type of lookup.Please ignore that. On Mon, Jul 1, 2013 at 2:24 PM, Viral Bajaria viral.baja...@gmail.comwrote: When you did the scan, did you check what the bottleneck was ? Was it I/O ? Did you see any GC locks ? How much RAM are you giving to your RS ? -Viral On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: To completely scan the table for all 140 columns , it takes around 30-40 minutes. -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain
Re: Behavior of Filter.transform() in FilterList?
You want transform to only be called on filters that are reached? I.e. FilterA and FilterB, FilterB.transform should not be called if a KV is already filtered by FilterA? That's not how it works right now, transform is called in a completely different code path from the actual filtering logic. -- Lars - Original Message - From: Christophe Taton ta...@wibidata.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 10:26 PM Subject: Re: Behavior of Filter.transform() in FilterList? On Sun, Jun 30, 2013 at 10:15 PM, Ted Yu yuzhih...@gmail.com wrote: The clause 'family=X and column=Y and KeyOnlyFilter' would be represented by a FilterList, right ? (family=A and colymn=B) would be represented by another FilterList. Yes, that would be FilterList(OR, [FilterList(AND, [family=X, column=Y, KeyOnlyFilter]), FilterList(AND, [family=A, column=B])]). So the behavior is expected. Could you explain, I'm not sure how you reach this conclusion. Are you saying it is expected, given the actual implementation FilterList.transform()? Or are there some other details I missed? Thanks! C. On Mon, Jul 1, 2013 at 1:10 PM, Christophe Taton ta...@wibidata.com wrote: Hi, From https://github.com/apache/hbase/blob/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java#L183 , it appears that Filter.transform() is invoked unconditionally on all filters in a FilterList hierarchy. This is quite confusing, especially since I may construct a filter like: (family=X and column=Y and KeyOnlyFilter) or (family=A and colymn=B) The KeyOnlyFilter will remove all values from the KeyValues in A:B as well. Is my understanding correct? Is this an expected/intended behavior? Thanks, C.
Re: How many column families in one table ?
Which version of HBase? Did you enable scanner caching? Otherwise each call to next() is a RPC roundtrip and you are basically measuring your networks RTT. -- Lars From: Vimal Jain vkj...@gmail.com To: user@hbase.apache.org Sent: Monday, July 1, 2013 4:11 AM Subject: Re: How many column families in one table ? Can someone please reply ? Also what is the typical read/write speed of hbase and how much deviation would be there in my scenario mentioned above (14 cf , total 140 columns ) ? I am asking this because i am not simply printing out the scanned values , instead i am applying some logic on the data retrieved per row basis. So was just curious to find if that small logic in my code is contributing towards the long time taken to scan the table. On Mon, Jul 1, 2013 at 2:41 PM, Vimal Jain vkj...@gmail.com wrote: I scanned it during normal traffic hours.There was no I/O load on the server. I dont see any GC locks too. Also i have given 1.5G to RS , 512M to each Master and Zookeeper. One correction in the post above : Actual time to scan whole table is even more , it takes 10 mins to scan 0.1 million rows ( so total of 2.5 hours to scan 1.6 million rows) . The time i mentioned in previous post was for different type of lookup.Please ignore that. On Mon, Jul 1, 2013 at 2:24 PM, Viral Bajaria viral.baja...@gmail.comwrote: When you did the scan, did you check what the bottleneck was ? Was it I/O ? Did you see any GC locks ? How much RAM are you giving to your RS ? -Viral On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: To completely scan the table for all 140 columns , it takes around 30-40 minutes. -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain
Re: How many column families in one table ?
Hi Lars, I am using Hadoop version - 1.1.2 and Hbase version - 0.94.7. Yes , I have enabled scanner caching with value 10K but performance is not too good. :( On Mon, Jul 1, 2013 at 4:48 PM, lars hofhansl la...@apache.org wrote: Which version of HBase? Did you enable scanner caching? Otherwise each call to next() is a RPC roundtrip and you are basically measuring your networks RTT. -- Lars From: Vimal Jain vkj...@gmail.com To: user@hbase.apache.org Sent: Monday, July 1, 2013 4:11 AM Subject: Re: How many column families in one table ? Can someone please reply ? Also what is the typical read/write speed of hbase and how much deviation would be there in my scenario mentioned above (14 cf , total 140 columns ) ? I am asking this because i am not simply printing out the scanned values , instead i am applying some logic on the data retrieved per row basis. So was just curious to find if that small logic in my code is contributing towards the long time taken to scan the table. On Mon, Jul 1, 2013 at 2:41 PM, Vimal Jain vkj...@gmail.com wrote: I scanned it during normal traffic hours.There was no I/O load on the server. I dont see any GC locks too. Also i have given 1.5G to RS , 512M to each Master and Zookeeper. One correction in the post above : Actual time to scan whole table is even more , it takes 10 mins to scan 0.1 million rows ( so total of 2.5 hours to scan 1.6 million rows) . The time i mentioned in previous post was for different type of lookup.Please ignore that. On Mon, Jul 1, 2013 at 2:24 PM, Viral Bajaria viral.baja...@gmail.com wrote: When you did the scan, did you check what the bottleneck was ? Was it I/O ? Did you see any GC locks ? How much RAM are you giving to your RS ? -Viral On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: To completely scan the table for all 140 columns , it takes around 30-40 minutes. -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain
Re: How many column families in one table ?
bq. I have configured Hbase in pseudo distributed mode on top of HDFS. What was the reason for using pseudo distributed mode in production setup ? Cheers On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: Thanks Dhaval/Michael/Ted/Otis for your replies. Actually , i asked this question because i am seeing some performance degradation in my production Hbase setup. I have configured Hbase in pseudo distributed mode on top of HDFS. I have created 17 Column families :( . I am actually using 14 out of these 17 column families. Each column family has around on average 8-10 column qualifiers so total around 140 columns are there for each row key. I have around 1.6 millions rows in the table. To completely scan the table for all 140 columns , it takes around 30-40 minutes. Is it normal or Should i redesign my table schema ( probably merging 4-5 column families into one , so that at the end i have just 3-4 cf ) ? On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hm, works for me - http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42 Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain vkj...@gmail.com wrote: Hi All , Thanks for your replies. Ted, Thanks for the link, but its not working . :( On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu yuzhih...@gmail.com wrote: Vimal: Please also refer to: http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel michael_se...@hotmail.com wrote: Short answer... As few as possible. 14 CF doesn't make too much sense. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 28, 2013, at 12:20 AM, Vimal Jain vkj...@gmail.com wrote: Hi, How many column families should be there in an hbase table ? Is there any performance issue in read/write if we have more column families ? I have designed one table with around 14 column families in it with each having on average 6 qualifiers. Is it a good design ? -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain
Re: How many column families in one table ?
Hi, We had some hardware constraints along with the fact that our total data size was in GBs. Thats why to start with Hbase , we first began with pseudo distributed mode and thought if required we would upgrade to fully distributed mode. On Mon, Jul 1, 2013 at 5:09 PM, Ted Yu yuzhih...@gmail.com wrote: bq. I have configured Hbase in pseudo distributed mode on top of HDFS. What was the reason for using pseudo distributed mode in production setup ? Cheers On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: Thanks Dhaval/Michael/Ted/Otis for your replies. Actually , i asked this question because i am seeing some performance degradation in my production Hbase setup. I have configured Hbase in pseudo distributed mode on top of HDFS. I have created 17 Column families :( . I am actually using 14 out of these 17 column families. Each column family has around on average 8-10 column qualifiers so total around 140 columns are there for each row key. I have around 1.6 millions rows in the table. To completely scan the table for all 140 columns , it takes around 30-40 minutes. Is it normal or Should i redesign my table schema ( probably merging 4-5 column families into one , so that at the end i have just 3-4 cf ) ? On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hm, works for me - http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42 Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain vkj...@gmail.com wrote: Hi All , Thanks for your replies. Ted, Thanks for the link, but its not working . :( On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu yuzhih...@gmail.com wrote: Vimal: Please also refer to: http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel michael_se...@hotmail.com wrote: Short answer... As few as possible. 14 CF doesn't make too much sense. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 28, 2013, at 12:20 AM, Vimal Jain vkj...@gmail.com wrote: Hi, How many column families should be there in an hbase table ? Is there any performance issue in read/write if we have more column families ? I have designed one table with around 14 column families in it with each having on average 6 qualifiers. Is it a good design ? -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain
Re: question about hbase envionmnet variable
Looking at bin/hbase: # check envvars which might override default args if [ $HBASE_HEAPSIZE != ]; then #echo run with heapsize $HBASE_HEAPSIZE JAVA_HEAP_MAX=-Xmx$HBASE_HEAPSIZEm #echo $JAVA_HEAP_MAX fi Meaning, if you set HBASE_HEAPSIZE environment variable, bin/hbase would take care of setting -Xmx. Cheers On Mon, Jul 1, 2013 at 12:11 AM, ch huang justlo...@gmail.com wrote: if i set HBASE_HEAPSIZE=2 (HEAP is 20G ) ,can i set jvm option -Xmx20g -Xms20G? if not ,how much i can set?
Re: Issues with delete markers
So, yesterday, I implemented this change via a coprocessor which basically initiates a scan which is raw, keeps tracking of # of delete markers encountered and stops when a configured threshold is met. It instantiates its own ScanDeleteTracker to do the masking through delete markers. So raw scan, count delete markers/stop if too many encountered and mask them so to return sane stuff back to the client. I guess until now it has been working reasonably. Also, with HBase 8809, version tracking etc. should also work with filters now. On Mon, Jul 1, 2013 at 3:58 AM, lars hofhansl la...@apache.org wrote: That would be quite dramatic change, we cannot pass delete markers to the existing filters without confusing them. We could invent a new method (filterDeleteKV or filterDeleteMarker or something) on filters along with a new filter type that implements that method. -- Lars - Original Message - From: Varun Sharma va...@pinterest.com To: d...@hbase.apache.org d...@hbase.apache.org; user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 1:56 PM Subject: Re: Issues with delete markers Sorry, typo, i meant that for user scans, should we be passing delete markers through.the filters as well ? Varun On Sun, Jun 30, 2013 at 1:03 PM, Varun Sharma va...@pinterest.com wrote: For user scans, i feel we should be passing delete markers through as well. On Sun, Jun 30, 2013 at 12:35 PM, Varun Sharma va...@pinterest.com wrote: I tried this a little bit and it seems that filters are not called on delete markers. For raw scans returning delete markers, does it make sense to do that ? Varun On Sun, Jun 30, 2013 at 12:03 PM, Varun Sharma va...@pinterest.com wrote: Hi, We are having an issue with the way HBase does handling of deletes. We are looking to retrieve 300 columns in a row but the row has tens of thousands of delete markers in it before we span the 300 columns something like this row DeleteCol1 Col1 DeleteCol2 Col2 ... DeleteCol3 Col3 And so on. Therefore, the issue here, being that to retrieve these 300 columns, we need to go through tens of thousands of deletes - sometimes we get a spurt of these queries and that DDoSes a region server. We are okay with saying, only return first 300 columns and stop once you encounter, say 5K column delete markers or something. I wonder if such a construct is provided by HBase or do we need to build something on top of the RAW scan and handle the delete masking there. Thanks Varun
Re: Issues with delete markers
I mean version tracking with delete markers... On Mon, Jul 1, 2013 at 8:17 AM, Varun Sharma va...@pinterest.com wrote: So, yesterday, I implemented this change via a coprocessor which basically initiates a scan which is raw, keeps tracking of # of delete markers encountered and stops when a configured threshold is met. It instantiates its own ScanDeleteTracker to do the masking through delete markers. So raw scan, count delete markers/stop if too many encountered and mask them so to return sane stuff back to the client. I guess until now it has been working reasonably. Also, with HBase 8809, version tracking etc. should also work with filters now. On Mon, Jul 1, 2013 at 3:58 AM, lars hofhansl la...@apache.org wrote: That would be quite dramatic change, we cannot pass delete markers to the existing filters without confusing them. We could invent a new method (filterDeleteKV or filterDeleteMarker or something) on filters along with a new filter type that implements that method. -- Lars - Original Message - From: Varun Sharma va...@pinterest.com To: d...@hbase.apache.org d...@hbase.apache.org; user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 1:56 PM Subject: Re: Issues with delete markers Sorry, typo, i meant that for user scans, should we be passing delete markers through.the filters as well ? Varun On Sun, Jun 30, 2013 at 1:03 PM, Varun Sharma va...@pinterest.com wrote: For user scans, i feel we should be passing delete markers through as well. On Sun, Jun 30, 2013 at 12:35 PM, Varun Sharma va...@pinterest.com wrote: I tried this a little bit and it seems that filters are not called on delete markers. For raw scans returning delete markers, does it make sense to do that ? Varun On Sun, Jun 30, 2013 at 12:03 PM, Varun Sharma va...@pinterest.com wrote: Hi, We are having an issue with the way HBase does handling of deletes. We are looking to retrieve 300 columns in a row but the row has tens of thousands of delete markers in it before we span the 300 columns something like this row DeleteCol1 Col1 DeleteCol2 Col2 ... DeleteCol3 Col3 And so on. Therefore, the issue here, being that to retrieve these 300 columns, we need to go through tens of thousands of deletes - sometimes we get a spurt of these queries and that DDoSes a region server. We are okay with saying, only return first 300 columns and stop once you encounter, say 5K column delete markers or something. I wonder if such a construct is provided by HBase or do we need to build something on top of the RAW scan and handle the delete masking there. Thanks Varun
Re: Issues with delete markers
That is the easy part :) The hard part is to add this to filters in a backwards compatible way. -- Lars - Original Message - From: Varun Sharma va...@pinterest.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Cc: d...@hbase.apache.org d...@hbase.apache.org Sent: Monday, July 1, 2013 8:18 AM Subject: Re: Issues with delete markers I mean version tracking with delete markers... On Mon, Jul 1, 2013 at 8:17 AM, Varun Sharma va...@pinterest.com wrote: So, yesterday, I implemented this change via a coprocessor which basically initiates a scan which is raw, keeps tracking of # of delete markers encountered and stops when a configured threshold is met. It instantiates its own ScanDeleteTracker to do the masking through delete markers. So raw scan, count delete markers/stop if too many encountered and mask them so to return sane stuff back to the client. I guess until now it has been working reasonably. Also, with HBase 8809, version tracking etc. should also work with filters now. On Mon, Jul 1, 2013 at 3:58 AM, lars hofhansl la...@apache.org wrote: That would be quite dramatic change, we cannot pass delete markers to the existing filters without confusing them. We could invent a new method (filterDeleteKV or filterDeleteMarker or something) on filters along with a new filter type that implements that method. -- Lars - Original Message - From: Varun Sharma va...@pinterest.com To: d...@hbase.apache.org d...@hbase.apache.org; user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 1:56 PM Subject: Re: Issues with delete markers Sorry, typo, i meant that for user scans, should we be passing delete markers through.the filters as well ? Varun On Sun, Jun 30, 2013 at 1:03 PM, Varun Sharma va...@pinterest.com wrote: For user scans, i feel we should be passing delete markers through as well. On Sun, Jun 30, 2013 at 12:35 PM, Varun Sharma va...@pinterest.com wrote: I tried this a little bit and it seems that filters are not called on delete markers. For raw scans returning delete markers, does it make sense to do that ? Varun On Sun, Jun 30, 2013 at 12:03 PM, Varun Sharma va...@pinterest.com wrote: Hi, We are having an issue with the way HBase does handling of deletes. We are looking to retrieve 300 columns in a row but the row has tens of thousands of delete markers in it before we span the 300 columns something like this row DeleteCol1 Col1 DeleteCol2 Col2 ... DeleteCol3 Col3 And so on. Therefore, the issue here, being that to retrieve these 300 columns, we need to go through tens of thousands of deletes - sometimes we get a spurt of these queries and that DDoSes a region server. We are okay with saying, only return first 300 columns and stop once you encounter, say 5K column delete markers or something. I wonder if such a construct is provided by HBase or do we need to build something on top of the RAW scan and handle the delete masking there. Thanks Varun
Re: How many column families in one table ?
The performance you're seeing is definitely not typical. 'couple of further questions: - How large are your KVs (columns)?- Do you delete data? Do you run major compactions? - Can you measure: CPU, IO, context switches, etc, during the scanning? - Do you have many versions of the columns? Note that HBase is a key value store, i.e. the storage is sparse. Each column is represented by its own key value pair, and HBase has to do the work to reassemble the data. -- Lars From: Vimal Jain vkj...@gmail.com To: user@hbase.apache.org Sent: Monday, July 1, 2013 4:44 AM Subject: Re: How many column families in one table ? Hi, We had some hardware constraints along with the fact that our total data size was in GBs. Thats why to start with Hbase , we first began with pseudo distributed mode and thought if required we would upgrade to fully distributed mode. On Mon, Jul 1, 2013 at 5:09 PM, Ted Yu yuzhih...@gmail.com wrote: bq. I have configured Hbase in pseudo distributed mode on top of HDFS. What was the reason for using pseudo distributed mode in production setup ? Cheers On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: Thanks Dhaval/Michael/Ted/Otis for your replies. Actually , i asked this question because i am seeing some performance degradation in my production Hbase setup. I have configured Hbase in pseudo distributed mode on top of HDFS. I have created 17 Column families :( . I am actually using 14 out of these 17 column families. Each column family has around on average 8-10 column qualifiers so total around 140 columns are there for each row key. I have around 1.6 millions rows in the table. To completely scan the table for all 140 columns , it takes around 30-40 minutes. Is it normal or Should i redesign my table schema ( probably merging 4-5 column families into one , so that at the end i have just 3-4 cf ) ? On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hm, works for me - http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42 Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain vkj...@gmail.com wrote: Hi All , Thanks for your replies. Ted, Thanks for the link, but its not working . :( On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu yuzhih...@gmail.com wrote: Vimal: Please also refer to: http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel michael_se...@hotmail.com wrote: Short answer... As few as possible. 14 CF doesn't make too much sense. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 28, 2013, at 12:20 AM, Vimal Jain vkj...@gmail.com wrote: Hi, How many column families should be there in an hbase table ? Is there any performance issue in read/write if we have more column families ? I have designed one table with around 14 column families in it with each having on average 6 qualifiers. Is it a good design ? -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain
Re: How many column families in one table ?
Hi Lars, 1)I have around 140 columns for each row , out of 140 , around 100 rows are holds java primitive data type , remaining 40 rows contains serialized java object as byte array. Yes , I do delete data but the frequency is very less ( 1 out of 5K operations ). I dont run any compaction. 2) I had ran scan keeping in mind the CPU,IO and other system related parameters.I found them to be normal with system load being 0.1-0.3. 3) Yes i have 3 versions of cell ( default value). On Mon, Jul 1, 2013 at 9:08 PM, lars hofhansl la...@apache.org wrote: The performance you're seeing is definitely not typical. 'couple of further questions: - How large are your KVs (columns)?- Do you delete data? Do you run major compactions? - Can you measure: CPU, IO, context switches, etc, during the scanning? - Do you have many versions of the columns? Note that HBase is a key value store, i.e. the storage is sparse. Each column is represented by its own key value pair, and HBase has to do the work to reassemble the data. -- Lars From: Vimal Jain vkj...@gmail.com To: user@hbase.apache.org Sent: Monday, July 1, 2013 4:44 AM Subject: Re: How many column families in one table ? Hi, We had some hardware constraints along with the fact that our total data size was in GBs. Thats why to start with Hbase , we first began with pseudo distributed mode and thought if required we would upgrade to fully distributed mode. On Mon, Jul 1, 2013 at 5:09 PM, Ted Yu yuzhih...@gmail.com wrote: bq. I have configured Hbase in pseudo distributed mode on top of HDFS. What was the reason for using pseudo distributed mode in production setup ? Cheers On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: Thanks Dhaval/Michael/Ted/Otis for your replies. Actually , i asked this question because i am seeing some performance degradation in my production Hbase setup. I have configured Hbase in pseudo distributed mode on top of HDFS. I have created 17 Column families :( . I am actually using 14 out of these 17 column families. Each column family has around on average 8-10 column qualifiers so total around 140 columns are there for each row key. I have around 1.6 millions rows in the table. To completely scan the table for all 140 columns , it takes around 30-40 minutes. Is it normal or Should i redesign my table schema ( probably merging 4-5 column families into one , so that at the end i have just 3-4 cf ) ? On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hm, works for me - http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42 Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain vkj...@gmail.com wrote: Hi All , Thanks for your replies. Ted, Thanks for the link, but its not working . :( On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu yuzhih...@gmail.com wrote: Vimal: Please also refer to: http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel michael_se...@hotmail.com wrote: Short answer... As few as possible. 14 CF doesn't make too much sense. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 28, 2013, at 12:20 AM, Vimal Jain vkj...@gmail.com wrote: Hi, How many column families should be there in an hbase table ? Is there any performance issue in read/write if we have more column families ? I have designed one table with around 14 column families in it with each having on average 6 qualifiers. Is it a good design ? -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain -- Thanks and Regards, Vimal Jain
BigSecret: A secure data management framework for Key-Value Stores
My name is Erman Pattuk. Together with my advisor Prof. Murat Kantarcioglu, we have developed an open source tool that enables secure and encrypted outsourcing of Key-Value stores to public cloud infrastructures. I would like to get feedback from interested users, so that I can improve and strengthen the project. When you need to outsource your data to a public cloud, there are potential privacy and security risks. Especially, if your data consists of highly private information, such as social security numbers or health records, it should be encrypted prior to outsourcing. However doing so complicates data processing. Thus, intelligent solutions need to be created that (i) protects the privacy outsourced data, and (ii) enables efficient processing of the outsourced data. Our framework, BigSecret, behaves as a middleware between the clients (entities that want to process queries), and Key-Value stores (may be public or private). It is scalable, in the sense that multiple independent copies of the application can be executed; and it provides formally proven security. Initially, we have implemented BigSecret to support HBase. We have created a simple library that supports basic operations, such as Put, Get, Delete, Scan, createTable over encrypted key-value pairs. It's still in its infancy, but we aim to improve it over time, and add support for multiple Key-Value Store implementations. You can access: - Source code:https://github.com/ermanpattuk/BigSecret - Technical report:http://www.utdallas.edu/~exp111430/techReport.pdf Our paper has been accepted in the prestigious IEEE Cloud 2013 conference. You can get a copy of technical report via the above link. Best Regards, Erman Pattuk
Re: Behavior of Filter.transform() in FilterList?
It would make sense, but it is not immediately clear how to do so cleanly. We would no longer be able to call transform at the StoreScanner level (or evaluate the filter multiple times, or require the filters to maintain their - last - state and only apply transform selectively). I added transform() a while ago in order to allow a Filter *not* to transform. Before each we defensively made a copy of the key, just in case a Filter (such as KeyOnlyFilter) would modify it, now this is a formalized, and the filter is responsible for making a copy only when needed. -- Lars From: Christophe Taton ta...@wibidata.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Monday, July 1, 2013 10:27 AM Subject: Re: Behavior of Filter.transform() in FilterList? On Mon, Jul 1, 2013 at 4:14 AM, lars hofhansl la...@apache.org wrote: You want transform to only be called on filters that are reached? I.e. FilterA and FilterB, FilterB.transform should not be called if a KV is already filtered by FilterA? Yes, that's what I naively expected, at first. That's not how it works right now, transform is called in a completely different code path from the actual filtering logic. Indeed, I just learned that. I found no documentation of this behavior, did I miss it? In particular, the javadoc of the workflow of Filter doesn't mention transform() at all. Would it make sense to apply transform() only if the return code for filterKeyValue() includes the KeyValue? C. -- Lars - Original Message - From: Christophe Taton ta...@wibidata.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 10:26 PM Subject: Re: Behavior of Filter.transform() in FilterList? On Sun, Jun 30, 2013 at 10:15 PM, Ted Yu yuzhih...@gmail.com wrote: The clause 'family=X and column=Y and KeyOnlyFilter' would be represented by a FilterList, right ? (family=A and colymn=B) would be represented by another FilterList. Yes, that would be FilterList(OR, [FilterList(AND, [family=X, column=Y, KeyOnlyFilter]), FilterList(AND, [family=A, column=B])]). So the behavior is expected. Could you explain, I'm not sure how you reach this conclusion. Are you saying it is expected, given the actual implementation FilterList.transform()? Or are there some other details I missed? Thanks! C. On Mon, Jul 1, 2013 at 1:10 PM, Christophe Taton ta...@wibidata.com wrote: Hi, From https://github.com/apache/hbase/blob/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java#L183 , it appears that Filter.transform() is invoked unconditionally on all filters in a FilterList hierarchy. This is quite confusing, especially since I may construct a filter like: (family=X and column=Y and KeyOnlyFilter) or (family=A and colymn=B) The KeyOnlyFilter will remove all values from the KeyValues in A:B as well. Is my understanding correct? Is this an expected/intended behavior? Thanks, C.
Re: How many column families in one table ?
On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain vkj...@gmail.com wrote: Sorry for the typo .. please ignore previous mail.. Here is the corrected one.. 1)I have around 140 columns for each row , out of 140 , around 100 columns hold java primitive data type , remaining 40 columns contain serialized java object as byte array(Inside each object is an ArrayList). Yes , I do delete data but the frequency is very less ( 1 out of 5K operations ). I dont run any compaction. This answers the type of data in each cell not the size of data. Can you figure out the average size of data that you insert in that size. For example what is the length of the byte array ? Also for java primitive, is it 8-byte long ? 4-byte int ? In addition to that, what is in the row key ? How long is that in bytes ? Same for column family, can you share the names of the column family ? How about qualifiers ? If you have disabled major compactions, you should run it once a few days (if not once a day) to consolidate the # of files that each scan will have to open. 2) I had ran scan keeping in mind the CPU,IO and other system related parameters.I found them to be normal with system load being 0.1-0.3. How many disks do you have in your box ? Have you ever benchmarked the hardware ? Thanks, Viral
simple export -- bulk import
I'm currently struggling with export/import between two hbase clusters. I have managed to create incremental exports from the source cluster (using hbase Export). Now I would like to bulk load the export into the destination (presumably using HFiles). The reason for the bulk load requirement is that the destination cluster is NOT tuned for individual puts (which is what the default import does). I've tried importtsv, but it seems to get confused by the exported data and I end-up with incorrect data in the destination. Has anyone successfully used export + import with a bulkload at the destination? If not, are there other utils I should consider using for this use-case? Thanks, Mike Ellery
stop_replication dangerous?
The first two tutorials for enabling replication that google gives me [1], [2] take very different tones with regard to stop_replication. The HBase docs [1] make it sound fine to start and stop replication as desired. The Cloudera docs [2] say it may cause data loss. Which is true? If data loss is possible, are we talking about data loss in the primary cluster, or data loss in the standby cluster (presumably would require reinitializing the sync with a new CopyTable). Thanks, Patrick [1] http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements [2] http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html
Re: stop_replication dangerous?
Yeah that package documentation ought to be changed. Mind opening a jira? Thx, J-D On Mon, Jul 1, 2013 at 1:51 PM, Patrick Schless patrick.schl...@gmail.com wrote: The first two tutorials for enabling replication that google gives me [1], [2] take very different tones with regard to stop_replication. The HBase docs [1] make it sound fine to start and stop replication as desired. The Cloudera docs [2] say it may cause data loss. Which is true? If data loss is possible, are we talking about data loss in the primary cluster, or data loss in the standby cluster (presumably would require reinitializing the sync with a new CopyTable). Thanks, Patrick [1] http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements [2] http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html
Re: Poor HBase map-reduce scan performance
Bryan, 3.6x improvement seems exciting. The ballpark difference between HBase scan and hdfs scan is in that order, so it is expected I guess. I plan to get back to the trunk patch, add more tests etc next week. In the mean time, if you have any changes to the patch, pls attach the patch. Enis On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl la...@apache.org wrote: Absolutely. - Original Message - From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail of HBASE-8369, there were some comments which are yet to be addressed. I think trunk patch should be finalized before backporting. Cheers On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller brya...@gmail.com wrote: I'll attach my patch to HBASE-8369 tomorrow. On Jun 28, 2013, at 10:56 AM, lars hofhansl la...@apache.org wrote: If we can make a clean patch with minimal impact to existing code I would be supportive of a backport to 0.94. -- Lars - Original Message - From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Cc: Sent: Tuesday, June 25, 2013 1:56 AM Subject: Re: Poor HBase map-reduce scan performance I tweaked Enis's snapshot input format and backported it to 0.94.6 and have snapshot scanning functional on my system. Performance is dramatically better, as expected i suppose. I'm seeing about 3.6x faster performance vs TableInputFormat. Also, HBase doesn't get bogged down during a scan as the regionserver is being bypassed. I'm very excited by this. There are some issues with file permissions and library dependencies but nothing that can't be worked out. On Jun 5, 2013, at 6:03 PM, lars hofhansl la...@apache.org wrote: That's exactly the kind of pre-fetching I was investigating a bit ago (made a patch, but ran out of time). This pre-fetching is strictly client only, where the client keeps the server busy while it is processing the previous batch, but filling up a 2nd buffer. -- Lars From: Sandy Pratt prat...@adobe.com To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, June 5, 2013 10:58 AM Subject: Re: Poor HBase map-reduce scan performance Yong, As a thought experiment, imagine how it impacts the throughput of TCP to keep the window size at 1. That means there's only one packet in flight at a time, and total throughput is a fraction of what it could be. That's effectively what happens with RPC. The server sends a batch, then does nothing while it waits for the client to ask for more. During that time, the pipe between them is empty. Increasing the batch size can help a bit, in essence creating a really huge packet, but the problem remains. There will always be stalls in the pipe. What you want is for the window size to be large enough that the pipe is saturated. A streaming API accomplishes that by stuffing data down the network pipe as quickly as possible. Sandy On 6/5/13 7:55 AM, yonghu yongyong...@gmail.com wrote: Can anyone explain why client + rpc + server will decrease the performance of scanning? I mean the Regionserver and Tasktracker are the same node when you use MapReduce to scan the HBase table. So, in my understanding, there will be no rpc cost. Thanks! Yong On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt prat...@adobe.com wrote: https://issues.apache.org/jira/browse/HBASE-8691 On 6/4/13 6:11 PM, Sandy Pratt prat...@adobe.com wrote: Haven't had a chance to write a JIRA yet, but I thought I'd pop in here with an update in the meantime. I tried a number of different approaches to eliminate latency and bubbles in the scan pipeline, and eventually arrived at adding a streaming scan API to the region server, along with refactoring the scan interface into an event-drive message receiver interface. In so doing, I was able to take scan speed on my cluster from 59,537 records/sec with the classic scanner to 222,703 records per second with my new scan API. Needless to say, I'm pleased ;) More details forthcoming when I get a chance. Thanks, Sandy On 5/23/13 3:47 PM, Ted Yu yuzhih...@gmail.com wrote: Thanks for the update, Sandy. If you can open a JIRA and attach your producer / consumer scanner there, that would be great. On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt prat...@adobe.com wrote: I wrote myself a Scanner wrapper that uses a producer/consumer queue to keep the client fed with a full buffer as much as possible. When scanning my table with scanner caching at 100 records, I see about a 24% uplift in performance (~35k records/sec with
Re: Poor HBase map-reduce scan performance
I attached my patch to the JIRA issue, in case anyone is interested. It can pretty easily be used on its own without patching HBase. I am currently doing this. On Jul 1, 2013, at 2:23 PM, Enis Söztutar enis@gmail.com wrote: Bryan, 3.6x improvement seems exciting. The ballpark difference between HBase scan and hdfs scan is in that order, so it is expected I guess. I plan to get back to the trunk patch, add more tests etc next week. In the mean time, if you have any changes to the patch, pls attach the patch. Enis On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl la...@apache.org wrote: Absolutely. - Original Message - From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail of HBASE-8369, there were some comments which are yet to be addressed. I think trunk patch should be finalized before backporting. Cheers On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller brya...@gmail.com wrote: I'll attach my patch to HBASE-8369 tomorrow. On Jun 28, 2013, at 10:56 AM, lars hofhansl la...@apache.org wrote: If we can make a clean patch with minimal impact to existing code I would be supportive of a backport to 0.94. -- Lars - Original Message - From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Cc: Sent: Tuesday, June 25, 2013 1:56 AM Subject: Re: Poor HBase map-reduce scan performance I tweaked Enis's snapshot input format and backported it to 0.94.6 and have snapshot scanning functional on my system. Performance is dramatically better, as expected i suppose. I'm seeing about 3.6x faster performance vs TableInputFormat. Also, HBase doesn't get bogged down during a scan as the regionserver is being bypassed. I'm very excited by this. There are some issues with file permissions and library dependencies but nothing that can't be worked out. On Jun 5, 2013, at 6:03 PM, lars hofhansl la...@apache.org wrote: That's exactly the kind of pre-fetching I was investigating a bit ago (made a patch, but ran out of time). This pre-fetching is strictly client only, where the client keeps the server busy while it is processing the previous batch, but filling up a 2nd buffer. -- Lars From: Sandy Pratt prat...@adobe.com To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, June 5, 2013 10:58 AM Subject: Re: Poor HBase map-reduce scan performance Yong, As a thought experiment, imagine how it impacts the throughput of TCP to keep the window size at 1. That means there's only one packet in flight at a time, and total throughput is a fraction of what it could be. That's effectively what happens with RPC. The server sends a batch, then does nothing while it waits for the client to ask for more. During that time, the pipe between them is empty. Increasing the batch size can help a bit, in essence creating a really huge packet, but the problem remains. There will always be stalls in the pipe. What you want is for the window size to be large enough that the pipe is saturated. A streaming API accomplishes that by stuffing data down the network pipe as quickly as possible. Sandy On 6/5/13 7:55 AM, yonghu yongyong...@gmail.com wrote: Can anyone explain why client + rpc + server will decrease the performance of scanning? I mean the Regionserver and Tasktracker are the same node when you use MapReduce to scan the HBase table. So, in my understanding, there will be no rpc cost. Thanks! Yong On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt prat...@adobe.com wrote: https://issues.apache.org/jira/browse/HBASE-8691 On 6/4/13 6:11 PM, Sandy Pratt prat...@adobe.com wrote: Haven't had a chance to write a JIRA yet, but I thought I'd pop in here with an update in the meantime. I tried a number of different approaches to eliminate latency and bubbles in the scan pipeline, and eventually arrived at adding a streaming scan API to the region server, along with refactoring the scan interface into an event-drive message receiver interface. In so doing, I was able to take scan speed on my cluster from 59,537 records/sec with the classic scanner to 222,703 records per second with my new scan API. Needless to say, I'm pleased ;) More details forthcoming when I get a chance. Thanks, Sandy On 5/23/13 3:47 PM, Ted Yu yuzhih...@gmail.com wrote: Thanks for the update, Sandy. If you can open a JIRA and attach your producer / consumer scanner there, that would be great. On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt prat...@adobe.com wrote: I wrote myself a Scanner wrapper that uses a producer/consumer queue to keep the client fed with a full buffer as much as possible. When scanning my table with scanner caching at 100 records, I see about a
Re: data loss after cluster wide power loss
Yes this is a known issue. The HDFS part of this was addressed in https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not available in 1.x release. I think HBase does not use this API yet. On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham lat...@davelink.net wrote: We're running HBase over HDFS 1.0.2 on about 1000 nodes. On Saturday the data center we were in had a total power failure and the cluster went down hard. When we brought it back up, HDFS reported 4 files as CORRUPT. We recovered the data in question from our secondary datacenter, but I'm trying to understand what happened and whether this is a bug in HDFS that should be fixed. From what I can tell the file was created and closed by the dfs client (hbase). Then HBase renamed it into a new directory and deleted some other files containing the same data. Then the cluster lost power. After the cluster was restarted, the datanodes reported into the namenode but the blocks for this file appeared as blocks being written - the namenode rejected them and the datanodes deleted the blocks. At this point there were no replicas for the blocks and the files were marked CORRUPT. The underlying file systems are ext3. Some questions that I would love get answers for if anyone with deeper understanding of HDFS can chime in: - Is this a known scenario where data loss is expected? (I found HDFS-1539 but that seems different) - When are blocks moved from blocksBeingWritten to current? Does that happen before a file close operation is acknowledged to a hdfs client? - Could it be that the DataNodes actually moved the blocks to current but after the restart ext3 rewound state somehow (forgive my ignorance of underlying file system behavior)? - Is there any other explanation for how this can happen? Here is a sequence of selected relevant log lines from the RS (HBase Region Server) NN (NameNode) and DN (DataNode - 1 example of 3 in question). It includes everything that mentions the block in question in the NameNode and one DataNode log. Please let me know if this more information that would be helpful. RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c with permission=rwxrwxrwx NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c. blk_1395839728632046111_357084589 DN 2013-06-29 11:16:06,832 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: / 10.0.5.237:50010 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to blk_1395839728632046111_357084589 size 25418340 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to blk_1395839728632046111_357084589 size 25418340 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to blk_1395839728632046111_357084589 size 25418340 DN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327 DN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_1395839728632046111_357084589 terminating NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on file /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c from client DFSClient_hb_rs_hs745,60020,1372470111932 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c is closed by DFSClient_hb_rs_hs745,60020,1372470111932 RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c to hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: Completed major compaction of 7 file(s) in n of users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m --- CRASH, RESTART - NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: addStoredBlock request received for blk_1395839728632046111_357084589 on 10.0.6.1:50010
Re: stop_replication dangerous?
sure thing: https://issues.apache.org/jira/browse/HBASE-8844 On Mon, Jul 1, 2013 at 3:59 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Yeah that package documentation ought to be changed. Mind opening a jira? Thx, J-D On Mon, Jul 1, 2013 at 1:51 PM, Patrick Schless patrick.schl...@gmail.com wrote: The first two tutorials for enabling replication that google gives me [1], [2] take very different tones with regard to stop_replication. The HBase docs [1] make it sound fine to start and stop replication as desired. The Cloudera docs [2] say it may cause data loss. Which is true? If data loss is possible, are we talking about data loss in the primary cluster, or data loss in the standby cluster (presumably would require reinitializing the sync with a new CopyTable). Thanks, Patrick [1] http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements [2] http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html
Re: data loss after cluster wide power loss
Thanks for the response, Suresh. I'm not sure that I understand the details properly. From my reading of HDFS-744 the hsync API would allow a client to make sure that at any point in time it's writes so far hit the disk. For example, for HBase it could apply a fsync after adding some edits to its WAL to ensure those edits are fully durable for a file which is still open. However, in this case the dfs file was closed and even renamed. Is it the case that even after a dfs file is closed and renamed that the data blocks would still not be synced and would still be stored by the datanode in blocksBeingWritten rather than in current? If that is case, would it be better for the NameNode not to reject replicas that are in blocksBeingWritten, especially if it doesn't have any other replicas available? Dave On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas sur...@hortonworks.comwrote: Yes this is a known issue. The HDFS part of this was addressed in https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not available in 1.x release. I think HBase does not use this API yet. On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham lat...@davelink.net wrote: We're running HBase over HDFS 1.0.2 on about 1000 nodes. On Saturday the data center we were in had a total power failure and the cluster went down hard. When we brought it back up, HDFS reported 4 files as CORRUPT. We recovered the data in question from our secondary datacenter, but I'm trying to understand what happened and whether this is a bug in HDFS that should be fixed. From what I can tell the file was created and closed by the dfs client (hbase). Then HBase renamed it into a new directory and deleted some other files containing the same data. Then the cluster lost power. After the cluster was restarted, the datanodes reported into the namenode but the blocks for this file appeared as blocks being written - the namenode rejected them and the datanodes deleted the blocks. At this point there were no replicas for the blocks and the files were marked CORRUPT. The underlying file systems are ext3. Some questions that I would love get answers for if anyone with deeper understanding of HDFS can chime in: - Is this a known scenario where data loss is expected? (I found HDFS-1539 but that seems different) - When are blocks moved from blocksBeingWritten to current? Does that happen before a file close operation is acknowledged to a hdfs client? - Could it be that the DataNodes actually moved the blocks to current but after the restart ext3 rewound state somehow (forgive my ignorance of underlying file system behavior)? - Is there any other explanation for how this can happen? Here is a sequence of selected relevant log lines from the RS (HBase Region Server) NN (NameNode) and DN (DataNode - 1 example of 3 in question). It includes everything that mentions the block in question in the NameNode and one DataNode log. Please let me know if this more information that would be helpful. RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c with permission=rwxrwxrwx NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c. blk_1395839728632046111_357084589 DN 2013-06-29 11:16:06,832 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: / 10.0.5.237:50010 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to blk_1395839728632046111_357084589 size 25418340 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to blk_1395839728632046111_357084589 size 25418340 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to blk_1395839728632046111_357084589 size 25418340 DN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_1395839728632046111_357084589 of size 25418340 from / 10.0.5.237:14327 DN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_1395839728632046111_357084589 terminating NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on file /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c from client DFSClient_hb_rs_hs745,60020,1372470111932 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR*
Re: how can i improve sequence write speed?
Hello there, I'm sorry I didn't quite get it. What do you mean by sequence write speed? If you are looking for ways to improve HBase writes, you might find this useful : http://hbase.apache.org/book/perf.writing.html Warm Regards, Tariq cloudfront.blogspot.com On Mon, Jul 1, 2013 at 9:44 AM, ch huang justlo...@gmail.com wrote: i deploy a hbase cluster use in product envionment,how can i improve sequence write speed? thanks all
Re: data loss after cluster wide power loss
HBase is interesting here, because it rewrites old data into new files. So a power outage by default would not just lose new data but potentially old data as well. You can enable sync on block close in HDFS, and then at least be sure that closed blocks (and thus files) are synced to disk physically. I found that if that is paired with the sync behind write fadvice hint there performance impact is minimal. -- Lars Dave Latham lat...@davelink.net wrote: Thanks for the response, Suresh. I'm not sure that I understand the details properly. From my reading of HDFS-744 the hsync API would allow a client to make sure that at any point in time it's writes so far hit the disk. For example, for HBase it could apply a fsync after adding some edits to its WAL to ensure those edits are fully durable for a file which is still open. However, in this case the dfs file was closed and even renamed. Is it the case that even after a dfs file is closed and renamed that the data blocks would still not be synced and would still be stored by the datanode in blocksBeingWritten rather than in current? If that is case, would it be better for the NameNode not to reject replicas that are in blocksBeingWritten, especially if it doesn't have any other replicas available? Dave On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas sur...@hortonworks.comwrote: Yes this is a known issue. The HDFS part of this was addressed in https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not available in 1.x release. I think HBase does not use this API yet. On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham lat...@davelink.net wrote: We're running HBase over HDFS 1.0.2 on about 1000 nodes. On Saturday the data center we were in had a total power failure and the cluster went down hard. When we brought it back up, HDFS reported 4 files as CORRUPT. We recovered the data in question from our secondary datacenter, but I'm trying to understand what happened and whether this is a bug in HDFS that should be fixed. From what I can tell the file was created and closed by the dfs client (hbase). Then HBase renamed it into a new directory and deleted some other files containing the same data. Then the cluster lost power. After the cluster was restarted, the datanodes reported into the namenode but the blocks for this file appeared as blocks being written - the namenode rejected them and the datanodes deleted the blocks. At this point there were no replicas for the blocks and the files were marked CORRUPT. The underlying file systems are ext3. Some questions that I would love get answers for if anyone with deeper understanding of HDFS can chime in: - Is this a known scenario where data loss is expected? (I found HDFS-1539 but that seems different) - When are blocks moved from blocksBeingWritten to current? Does that happen before a file close operation is acknowledged to a hdfs client? - Could it be that the DataNodes actually moved the blocks to current but after the restart ext3 rewound state somehow (forgive my ignorance of underlying file system behavior)? - Is there any other explanation for how this can happen? Here is a sequence of selected relevant log lines from the RS (HBase Region Server) NN (NameNode) and DN (DataNode - 1 example of 3 in question). It includes everything that mentions the block in question in the NameNode and one DataNode log. Please let me know if this more information that would be helpful. RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c with permission=rwxrwxrwx NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c. blk_1395839728632046111_357084589 DN 2013-06-29 11:16:06,832 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: / 10.0.5.237:50010 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to blk_1395839728632046111_357084589 size 25418340 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to blk_1395839728632046111_357084589 size 25418340 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to blk_1395839728632046111_357084589 size 25418340 DN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received block blk_1395839728632046111_357084589 of size 25418340 from / 10.0.5.237:14327 DN 2013-06-29 11:16:11,385
Re: data loss after cluster wide power loss
how to enable sync on block close in HDFS? --Send from my Sony mobile. On Jul 2, 2013 6:47 AM, Lars Hofhansl lhofha...@yahoo.com wrote: HBase is interesting here, because it rewrites old data into new files. So a power outage by default would not just lose new data but potentially old data as well. You can enable sync on block close in HDFS, and then at least be sure that closed blocks (and thus files) are synced to disk physically. I found that if that is paired with the sync behind write fadvice hint there performance impact is minimal. -- Lars Dave Latham lat...@davelink.net wrote: Thanks for the response, Suresh. I'm not sure that I understand the details properly. From my reading of HDFS-744 the hsync API would allow a client to make sure that at any point in time it's writes so far hit the disk. For example, for HBase it could apply a fsync after adding some edits to its WAL to ensure those edits are fully durable for a file which is still open. However, in this case the dfs file was closed and even renamed. Is it the case that even after a dfs file is closed and renamed that the data blocks would still not be synced and would still be stored by the datanode in blocksBeingWritten rather than in current? If that is case, would it be better for the NameNode not to reject replicas that are in blocksBeingWritten, especially if it doesn't have any other replicas available? Dave On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas sur...@hortonworks.com wrote: Yes this is a known issue. The HDFS part of this was addressed in https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not available in 1.x release. I think HBase does not use this API yet. On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham lat...@davelink.net wrote: We're running HBase over HDFS 1.0.2 on about 1000 nodes. On Saturday the data center we were in had a total power failure and the cluster went down hard. When we brought it back up, HDFS reported 4 files as CORRUPT. We recovered the data in question from our secondary datacenter, but I'm trying to understand what happened and whether this is a bug in HDFS that should be fixed. From what I can tell the file was created and closed by the dfs client (hbase). Then HBase renamed it into a new directory and deleted some other files containing the same data. Then the cluster lost power. After the cluster was restarted, the datanodes reported into the namenode but the blocks for this file appeared as blocks being written - the namenode rejected them and the datanodes deleted the blocks. At this point there were no replicas for the blocks and the files were marked CORRUPT. The underlying file systems are ext3. Some questions that I would love get answers for if anyone with deeper understanding of HDFS can chime in: - Is this a known scenario where data loss is expected? (I found HDFS-1539 but that seems different) - When are blocks moved from blocksBeingWritten to current? Does that happen before a file close operation is acknowledged to a hdfs client? - Could it be that the DataNodes actually moved the blocks to current but after the restart ext3 rewound state somehow (forgive my ignorance of underlying file system behavior)? - Is there any other explanation for how this can happen? Here is a sequence of selected relevant log lines from the RS (HBase Region Server) NN (NameNode) and DN (DataNode - 1 example of 3 in question). It includes everything that mentions the block in question in the NameNode and one DataNode log. Please let me know if this more information that would be helpful. RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c with permission=rwxrwxrwx NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c. blk_1395839728632046111_357084589 DN 2013-06-29 11:16:06,832 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: / 10.0.5.237:50010 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to blk_1395839728632046111_357084589 size 25418340 NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to blk_1395839728632046111_357084589 size 25418340 NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to
Re: data loss after cluster wide power loss
On Mon, Jul 1, 2013 at 4:52 PM, Azuryy Yu azury...@gmail.com wrote: how to enable sync on block close in HDFS? Set dfs.datanode.synconclose to true See https://issues.apache.org/jira/browse/HDFS-1539
Re: Behavior of Filter.transform() in FilterList?
On Mon, Jul 1, 2013 at 12:01 PM, lars hofhansl la...@apache.org wrote: It would make sense, but it is not immediately clear how to do so cleanly. We would no longer be able to call transform at the StoreScanner level (or evaluate the filter multiple times, or require the filters to maintain their - last - state and only apply transform selectively). I believe this change can be implemented directly in FilterList, without requiring other changes. A FilterList could compute its transformed KeyValue while applying filterKeyValue() on the filter it contains, and return the pre-computed transformed KeyValue in FilterList.transform() if it makes sense to do so. This means Filter.transform() is always applied immediately after a filterKeyValue() with a return code that includes the KeyValue, and this would be true for all filters in the hierarchy. C. I added transform() a while ago in order to allow a Filter *not* to transform. Before each we defensively made a copy of the key, just in case a Filter (such as KeyOnlyFilter) would modify it, now this is a formalized, and the filter is responsible for making a copy only when needed. -- Lars From: Christophe Taton ta...@wibidata.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Monday, July 1, 2013 10:27 AM Subject: Re: Behavior of Filter.transform() in FilterList? On Mon, Jul 1, 2013 at 4:14 AM, lars hofhansl la...@apache.org wrote: You want transform to only be called on filters that are reached? I.e. FilterA and FilterB, FilterB.transform should not be called if a KV is already filtered by FilterA? Yes, that's what I naively expected, at first. That's not how it works right now, transform is called in a completely different code path from the actual filtering logic. Indeed, I just learned that. I found no documentation of this behavior, did I miss it? In particular, the javadoc of the workflow of Filter doesn't mention transform() at all. Would it make sense to apply transform() only if the return code for filterKeyValue() includes the KeyValue? C. -- Lars - Original Message - From: Christophe Taton ta...@wibidata.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 10:26 PM Subject: Re: Behavior of Filter.transform() in FilterList? On Sun, Jun 30, 2013 at 10:15 PM, Ted Yu yuzhih...@gmail.com wrote: The clause 'family=X and column=Y and KeyOnlyFilter' would be represented by a FilterList, right ? (family=A and colymn=B) would be represented by another FilterList. Yes, that would be FilterList(OR, [FilterList(AND, [family=X, column=Y, KeyOnlyFilter]), FilterList(AND, [family=A, column=B])]). So the behavior is expected. Could you explain, I'm not sure how you reach this conclusion. Are you saying it is expected, given the actual implementation FilterList.transform()? Or are there some other details I missed? Thanks! C. On Mon, Jul 1, 2013 at 1:10 PM, Christophe Taton ta...@wibidata.com wrote: Hi, From https://github.com/apache/hbase/blob/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java#L183 , it appears that Filter.transform() is invoked unconditionally on all filters in a FilterList hierarchy. This is quite confusing, especially since I may construct a filter like: (family=X and column=Y and KeyOnlyFilter) or (family=A and colymn=B) The KeyOnlyFilter will remove all values from the KeyValues in A:B as well. Is my understanding correct? Is this an expected/intended behavior? Thanks, C.
Re: Behavior of Filter.transform() in FilterList?
Christophe: Looks like you have clear idea of what to do. If you can show us in the form of patch, that would be nice. Cheers On Mon, Jul 1, 2013 at 7:17 PM, Christophe Taton ta...@wibidata.com wrote: On Mon, Jul 1, 2013 at 12:01 PM, lars hofhansl la...@apache.org wrote: It would make sense, but it is not immediately clear how to do so cleanly. We would no longer be able to call transform at the StoreScanner level (or evaluate the filter multiple times, or require the filters to maintain their - last - state and only apply transform selectively). I believe this change can be implemented directly in FilterList, without requiring other changes. A FilterList could compute its transformed KeyValue while applying filterKeyValue() on the filter it contains, and return the pre-computed transformed KeyValue in FilterList.transform() if it makes sense to do so. This means Filter.transform() is always applied immediately after a filterKeyValue() with a return code that includes the KeyValue, and this would be true for all filters in the hierarchy. C. I added transform() a while ago in order to allow a Filter *not* to transform. Before each we defensively made a copy of the key, just in case a Filter (such as KeyOnlyFilter) would modify it, now this is a formalized, and the filter is responsible for making a copy only when needed. -- Lars From: Christophe Taton ta...@wibidata.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Monday, July 1, 2013 10:27 AM Subject: Re: Behavior of Filter.transform() in FilterList? On Mon, Jul 1, 2013 at 4:14 AM, lars hofhansl la...@apache.org wrote: You want transform to only be called on filters that are reached? I.e. FilterA and FilterB, FilterB.transform should not be called if a KV is already filtered by FilterA? Yes, that's what I naively expected, at first. That's not how it works right now, transform is called in a completely different code path from the actual filtering logic. Indeed, I just learned that. I found no documentation of this behavior, did I miss it? In particular, the javadoc of the workflow of Filter doesn't mention transform() at all. Would it make sense to apply transform() only if the return code for filterKeyValue() includes the KeyValue? C. -- Lars - Original Message - From: Christophe Taton ta...@wibidata.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 10:26 PM Subject: Re: Behavior of Filter.transform() in FilterList? On Sun, Jun 30, 2013 at 10:15 PM, Ted Yu yuzhih...@gmail.com wrote: The clause 'family=X and column=Y and KeyOnlyFilter' would be represented by a FilterList, right ? (family=A and colymn=B) would be represented by another FilterList. Yes, that would be FilterList(OR, [FilterList(AND, [family=X, column=Y, KeyOnlyFilter]), FilterList(AND, [family=A, column=B])]). So the behavior is expected. Could you explain, I'm not sure how you reach this conclusion. Are you saying it is expected, given the actual implementation FilterList.transform()? Or are there some other details I missed? Thanks! C. On Mon, Jul 1, 2013 at 1:10 PM, Christophe Taton ta...@wibidata.com wrote: Hi, From https://github.com/apache/hbase/blob/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java#L183 , it appears that Filter.transform() is invoked unconditionally on all filters in a FilterList hierarchy. This is quite confusing, especially since I may construct a filter like: (family=X and column=Y and KeyOnlyFilter) or (family=A and colymn=B) The KeyOnlyFilter will remove all values from the KeyValues in A:B as well. Is my understanding correct? Is this an expected/intended behavior? Thanks, C.