Have a look at FuzzyRowFilter
-Anoop-
On Sat, Jun 22, 2013 at 9:20 AM, Tony Dean tony.d...@sas.com wrote:
I understand more, but have additional questions about the internals...
So, in this example I have 6000 rows X 40 columns in this table. In this
test my startRow and stopRow do not
Did you try passing in the log level via generic options?
E.g. I can switch the log level of a running job via:
hadoop jar hadoop-mapreduce-examples.jar pi *-D
mapred.map.child.log.level=DEBUG *10 10
hadoop jar hadoop-mapreduce-examples.jar pi *-D
mapred.map.child.log.level=INFO* 10 10
--Suraj
Hi, All
I am asking the different practices of major and minor compaction... My
current understanding is that minor compaction, triggered automatically,
usually run along with online query serving (but in background), so that it
is important to make it as lightweight as possible... to minimise
Hi Yun,
Few links:
- http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/
= There is a small paragraph about compactions which explain when
they are triggered.
- http://hbase.apache.org/book/regions.arch.html 9.7.6.5
You are almost right. Only thing is that HBase doesn't know when
Thanks, JM
It seems like the sole difference btwn major and minor compaction is the
number of files (to be all or just a subset of storefiles). It mentioned
very briefly in
http://hbase.apache.org/bookhttp://hbase.apache.org/book/regions.arch.htmlthat
Sometimes a minor compaction will ... promote
essential column families help when you filter on one column but want to
return *other* columns for the rows that matched the column.
Check out HBASE-5416.
-- Lars
From: Vladimir Rodionov vrodio...@carrieriq.com
To: user@hbase.apache.org
Hi Yun,
There is more differences.
The minor compactions are not remove the delete flags and the deleted
cells. It only merge the small files into a bigger one. Only the major
compaction (in 0.94) will deal with the delete cells. There is also
some more compaction mechanism coming in trunk with
Yep generally you should design your keys such that start/stopKey can
efficiently narrow the scope.
If that really cannot be done (and you should try hard), the 2nd best option
are skip scans.
Filters in HBase allow for providing the scanner framework with hints where to
go next.
They can
Hello All,
I learn hbase almost from papers and books, according to my
understanding, HBase is the kind of architecture which is more appliable
to a big cluster. We should have many HDFS nodes, and many HBase(region
server) nodes. If we only have several severs(5-8), it seems hbase is
not a good
Hi Ning,
I'm personally running HBase in production with only 8 nodes.
As you will see here: http://wiki.apache.org/hadoop/Hbase/PoweredBy
some are also running small clusters.
So I will say it more depend on you need than on the size.
I will say the minimum is 4 to make sure you have your
Hello there,
IMHO, 5-8 servers are sufficient enough to start with. But it's all
relative to the data you have and the intensity of your reads/writes. You
should have different strategies though, based on whether it's 'read' or
'write'. You actually can't define 'big' in absolute terms.
I am more concerned with CompactionPolicy available that allows application
to manipulate a bit how compaction should go... It looks like there is
newest API in .97 version
Oh, you already have heavyweight's input :).
Thanks JM.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sat, Jun 22, 2013 at 8:05 PM, Mohammad Tariq donta...@gmail.com wrote:
Hello there,
IMHO, 5-8 servers are sufficient enough to start with. But it's
all relative to the data you
Hi, All...
We have a user case intended to run Mapreduce in background, while the
server serves online operations. The MapReduce job may have lower priority
comparing to the online jobs..
I know this is a different use case of Mapreduce comparing to its
originally targeted scenario (where
Thanks for your response.
Now if 5 servers are enough, how can I install and configure my nodes?
If I need 3 replicas in case data loss, I should at least have 3
datanodes, we still have namenode, regionserver and HMaster nodes,
zookeeper nodes, some of them must be installed in the same
With 8 machines you can do something like this :
Machine 1 - NN+JT
Machine 2 - SNN+ZK1
Machine 3 - HM+ZK2
Machine 4-8 - DN+TT+RS
(You can run ZK3 on a slave node with some additional memory).
DN and RS run on the same machine. Although RSs are said to hold the data,
the data is actually stored
You HAVE TO run a ZK3, or else you don't need to have ZK2 and any ZK
failure will be an issue. You need to have an odd number of ZK
servers...
Also, if you don't run MR jobs, you don't need the TT and JT... Else,
everything below is correct. But there is many other options, all
depend on your
If you run ZK with a DN/TT/RS please make sure to dedicate a hard drive and
a core to the ZK process. I have seen many strange occurrences.
On Jun 22, 2013 12:10 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org
wrote:
You HAVE TO run a ZK3, or else you don't need to have ZK2 and any ZK
failure
Yeah, I forgot to mention that no. of ZKs should be odd. Perhaps those
parentheses made that statement look like an optional statement. Just to
clarify it was mandatory.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sat, Jun 22, 2013 at 9:45 PM, Kevin O'dell kevin.od...@cloudera.comwrote:
If
Hi Rahit,
The list is a bad idea. When you will have millions of lines per
regions, are going to pu millions of them in memory in your list?
Your MR will scan the entire table, row by row. If you modify the
current row, when the scanner will search for the next one, it will
not look at current
Hi Tony,
Have you had a look at Phoenix(https://github.com/forcedotcom/phoenix), a SQL
skin over HBase? It has a skip scan that will let you model a multi part row
key and skip through it efficiently as you've described. Take a look at this
blog for more info:
Thanks JM, I am not so concerned about holding those rows in memory because
they are mostly ordered integers and I would be using a bitset. So I have
some leeway in that sense. My dilemma was
1. updating instantly within the map
2. bulk updating at the end of the map
Yes I do understand the
Yes, you can change your task tracker startup script to use nice and ionice
and restart the task tracker process. The mappers and reducers spun off
this task tracker will inherit the niceness.
See the first comment in
http://blog.cloudera.com/blog/2011/04/hbase-dos-and-donts/
Quoting:
change the
In contrast, the major compaction is invoked in offpeak time and usually
can be assume to have resource exclusively.
There is no resource exclusivity with major compactions. It is just more
resource _intensive_ because a major compaction will rewrite all the store
files to end up with a single
Hi Rohit,
It will alway be consistent. I don't see why there will be any
un-consistency with the scenario your described below.
JM
2013/6/22 Rohit Kelkar rohitkel...@gmail.com:
Thanks JM, I am not so concerned about holding those rows in memory because
they are mostly ordered integers and I
Hi Mohammad,
I am curious why you chose not to put the third ZK on the NN+JT? I was
planning on doing that on a new cluster and want to confirm it would be
okay.
--
Iain Wright
Cell: (562) 852-5916
http://www.labctsi.org/
This email message is confidential, intended only for the recipient(s)
Hello Iain,
You would put a lot of pressure on the RAM if you do that. NN
already has high memory requirement and then having JT+ZK on the same
machine would be too heavy, IMHO.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Sun, Jun 23, 2013 at 4:07 AM, iain wright iainw...@gmail.com
Major compactions floods network, leaving for other operations too little
space. The reason why major compaction are
so prohibitively expensive in HBase - 2 block replicas which need to be created
in the cluster for every block written locally.
Best regards,
Vladimir Rodionov
Principal Platform
After extracting, changing etc/hosts file, made some changes in
hdfs-site.xml file and hbase-env.sh file. I cant see any of hbase process
running after issuing bin/start-hbase.sh command.
my hdfs-site.xml file is
?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?
is there anything in the log files? check both logs/*.out and logs/*.log
On Sun, Jun 23, 2013 at 6:54 AM, Rajkumar rajkumar22...@gmail.com wrote:
After extracting, changing etc/hosts file, made some changes in
hdfs-site.xml file and hbase-env.sh file. I cant see any of hbase process
running
30 matches
Mail list logo