Re: Scan performance

2013-06-22 Thread Anoop John
Have a look at FuzzyRowFilter -Anoop- On Sat, Jun 22, 2013 at 9:20 AM, Tony Dean tony.d...@sas.com wrote: I understand more, but have additional questions about the internals... So, in this example I have 6000 rows X 40 columns in this table. In this test my startRow and stopRow do not

Re: Logging for MR Job

2013-06-22 Thread Suraj Varma
Did you try passing in the log level via generic options? E.g. I can switch the log level of a running job via: hadoop jar hadoop-mapreduce-examples.jar pi *-D mapred.map.child.log.level=DEBUG *10 10 hadoop jar hadoop-mapreduce-examples.jar pi *-D mapred.map.child.log.level=INFO* 10 10 --Suraj

difference between major and minor compactions?

2013-06-22 Thread yun peng
Hi, All I am asking the different practices of major and minor compaction... My current understanding is that minor compaction, triggered automatically, usually run along with online query serving (but in background), so that it is important to make it as lightweight as possible... to minimise

Re: difference between major and minor compactions?

2013-06-22 Thread Jean-Marc Spaggiari
Hi Yun, Few links: - http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/ = There is a small paragraph about compactions which explain when they are triggered. - http://hbase.apache.org/book/regions.arch.html 9.7.6.5 You are almost right. Only thing is that HBase doesn't know when

Re: difference between major and minor compactions?

2013-06-22 Thread yun peng
Thanks, JM It seems like the sole difference btwn major and minor compaction is the number of files (to be all or just a subset of storefiles). It mentioned very briefly in http://hbase.apache.org/bookhttp://hbase.apache.org/book/regions.arch.htmlthat Sometimes a minor compaction will ... promote

Re: Scan performance

2013-06-22 Thread lars hofhansl
essential column families help when you filter on one column but want to return *other* columns for the rows that matched the column. Check out HBASE-5416. -- Lars From: Vladimir Rodionov vrodio...@carrieriq.com To: user@hbase.apache.org

Re: difference between major and minor compactions?

2013-06-22 Thread Jean-Marc Spaggiari
Hi Yun, There is more differences. The minor compactions are not remove the delete flags and the deleted cells. It only merge the small files into a bigger one. Only the major compaction (in 0.94) will deal with the delete cells. There is also some more compaction mechanism coming in trunk with

Re: Scan performance

2013-06-22 Thread lars hofhansl
Yep generally you should design your keys such that start/stopKey can efficiently narrow the scope. If that really cannot be done (and you should try hard), the 2nd  best option are skip scans. Filters in HBase allow for providing the scanner framework with hints where to go next. They can

how many severs in a hbase cluster

2013-06-22 Thread myhbase
Hello All, I learn hbase almost from papers and books, according to my understanding, HBase is the kind of architecture which is more appliable to a big cluster. We should have many HDFS nodes, and many HBase(region server) nodes. If we only have several severs(5-8), it seems hbase is not a good

Re: how many severs in a hbase cluster

2013-06-22 Thread Jean-Marc Spaggiari
Hi Ning, I'm personally running HBase in production with only 8 nodes. As you will see here: http://wiki.apache.org/hadoop/Hbase/PoweredBy some are also running small clusters. So I will say it more depend on you need than on the size. I will say the minimum is 4 to make sure you have your

Re: how many severs in a hbase cluster

2013-06-22 Thread Mohammad Tariq
Hello there, IMHO, 5-8 servers are sufficient enough to start with. But it's all relative to the data you have and the intensity of your reads/writes. You should have different strategies though, based on whether it's 'read' or 'write'. You actually can't define 'big' in absolute terms.

Re: difference between major and minor compactions?

2013-06-22 Thread yun peng
I am more concerned with CompactionPolicy available that allows application to manipulate a bit how compaction should go... It looks like there is newest API in .97 version

Re: how many severs in a hbase cluster

2013-06-22 Thread Mohammad Tariq
Oh, you already have heavyweight's input :). Thanks JM. Warm Regards, Tariq cloudfront.blogspot.com On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq donta...@gmail.com wrote: Hello there, IMHO, 5-8 servers are sufficient enough to start with. But it's all relative to the data you

Any mechanism in Hadoop to run in background

2013-06-22 Thread yun peng
Hi, All... We have a user case intended to run Mapreduce in background, while the server serves online operations. The MapReduce job may have lower priority comparing to the online jobs.. I know this is a different use case of Mapreduce comparing to its originally targeted scenario (where

Re: how many severs in a hbase cluster

2013-06-22 Thread myhbase
Thanks for your response. Now if 5 servers are enough, how can I install and configure my nodes? If I need 3 replicas in case data loss, I should at least have 3 datanodes, we still have namenode, regionserver and HMaster nodes, zookeeper nodes, some of them must be installed in the same

Re: how many severs in a hbase cluster

2013-06-22 Thread Mohammad Tariq
With 8 machines you can do something like this : Machine 1 - NN+JT Machine 2 - SNN+ZK1 Machine 3 - HM+ZK2 Machine 4-8 - DN+TT+RS (You can run ZK3 on a slave node with some additional memory). DN and RS run on the same machine. Although RSs are said to hold the data, the data is actually stored

Re: how many severs in a hbase cluster

2013-06-22 Thread Jean-Marc Spaggiari
You HAVE TO run a ZK3, or else you don't need to have ZK2 and any ZK failure will be an issue. You need to have an odd number of ZK servers... Also, if you don't run MR jobs, you don't need the TT and JT... Else, everything below is correct. But there is many other options, all depend on your

Re: how many severs in a hbase cluster

2013-06-22 Thread Kevin O'dell
If you run ZK with a DN/TT/RS please make sure to dedicate a hard drive and a core to the ZK process. I have seen many strange occurrences. On Jun 22, 2013 12:10 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: You HAVE TO run a ZK3, or else you don't need to have ZK2 and any ZK failure

Re: how many severs in a hbase cluster

2013-06-22 Thread Mohammad Tariq
Yeah, I forgot to mention that no. of ZKs should be odd. Perhaps those parentheses made that statement look like an optional statement. Just to clarify it was mandatory. Warm Regards, Tariq cloudfront.blogspot.com On Sat, Jun 22, 2013 at 9:45 PM, Kevin O'dell kevin.od...@cloudera.comwrote: If

Re: running MR job and puts on the same table

2013-06-22 Thread Jean-Marc Spaggiari
Hi Rahit, The list is a bad idea. When you will have millions of lines per regions, are going to pu millions of them in memory in your list? Your MR will scan the entire table, row by row. If you modify the current row, when the scanner will search for the next one, it will not look at current

Re: Scan performance

2013-06-22 Thread James Taylor
Hi Tony, Have you had a look at Phoenix(https://github.com/forcedotcom/phoenix), a SQL skin over HBase? It has a skip scan that will let you model a multi part row key and skip through it efficiently as you've described. Take a look at this blog for more info:

Re: running MR job and puts on the same table

2013-06-22 Thread Rohit Kelkar
Thanks JM, I am not so concerned about holding those rows in memory because they are mostly ordered integers and I would be using a bitset. So I have some leeway in that sense. My dilemma was 1. updating instantly within the map 2. bulk updating at the end of the map Yes I do understand the

Re: Any mechanism in Hadoop to run in background

2013-06-22 Thread Suraj Varma
Yes, you can change your task tracker startup script to use nice and ionice and restart the task tracker process. The mappers and reducers spun off this task tracker will inherit the niceness. See the first comment in http://blog.cloudera.com/blog/2011/04/hbase-dos-and-donts/ Quoting: change the

Re: difference between major and minor compactions?

2013-06-22 Thread Suraj Varma
In contrast, the major compaction is invoked in offpeak time and usually can be assume to have resource exclusively. There is no resource exclusivity with major compactions. It is just more resource _intensive_ because a major compaction will rewrite all the store files to end up with a single

Re: running MR job and puts on the same table

2013-06-22 Thread Jean-Marc Spaggiari
Hi Rohit, It will alway be consistent. I don't see why there will be any un-consistency with the scenario your described below. JM 2013/6/22 Rohit Kelkar rohitkel...@gmail.com: Thanks JM, I am not so concerned about holding those rows in memory because they are mostly ordered integers and I

Re: how many severs in a hbase cluster

2013-06-22 Thread iain wright
Hi Mohammad, I am curious why you chose not to put the third ZK on the NN+JT? I was planning on doing that on a new cluster and want to confirm it would be okay. -- Iain Wright Cell: (562) 852-5916 http://www.labctsi.org/ This email message is confidential, intended only for the recipient(s)

Re: how many severs in a hbase cluster

2013-06-22 Thread Mohammad Tariq
Hello Iain, You would put a lot of pressure on the RAM if you do that. NN already has high memory requirement and then having JT+ZK on the same machine would be too heavy, IMHO. Warm Regards, Tariq cloudfront.blogspot.com On Sun, Jun 23, 2013 at 4:07 AM, iain wright iainw...@gmail.com

RE: difference between major and minor compactions?

2013-06-22 Thread Vladimir Rodionov
Major compactions floods network, leaving for other operations too little space. The reason why major compaction are so prohibitively expensive in HBase - 2 block replicas which need to be created in the cluster for every block written locally. Best regards, Vladimir Rodionov Principal Platform

Hbase pseudo distributed setup not starting

2013-06-22 Thread Rajkumar
After extracting, changing etc/hosts file, made some changes in hdfs-site.xml file and hbase-env.sh file. I cant see any of hbase process running after issuing bin/start-hbase.sh command. my hdfs-site.xml file is ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl?

Re: Hbase pseudo distributed setup not starting

2013-06-22 Thread Ulrich Staudinger
is there anything in the log files? check both logs/*.out and logs/*.log On Sun, Jun 23, 2013 at 6:54 AM, Rajkumar rajkumar22...@gmail.com wrote: After extracting, changing etc/hosts file, made some changes in hdfs-site.xml file and hbase-env.sh file. I cant see any of hbase process running