Same for me, I had faced similar issues especially on my virtual machines
since I would restart them more often than my host machine.
Moving ZK from /tmp which could get cleared on reboots fixed the issue for
me.
Thanks,
Viral
On Sun, May 17, 2015 at 10:39 PM, Lars George lars.geo...@gmail.com
when decoding fails with an uncaught
exception.
Release 1.5.0.
Brandon Forehand (1):
Add support for prefetching the meta region.
Phil Smith (1):
Here's some one-liners to compile and run tests.
St.Ack (1):
Make mvn build accomodate protobuf files
Viral Bajaria
Pradeep, correct me if I am wrong but prestodb has not released the HBase
plugin as yet or they did and maybe I missed the announcement ?
I agree with what Doug is saying here, you can't achieve 100ms on every
kind of query on HBase unless and until you design the rowkey in a way to
help you
.
Viral Bajaria (2):
Initial commit for ScanFilter.
Add more scanner filters.
Xun Liu (1):
Properly honor timestamps in DeleteRequest.
--
Benoit tsuna Sigoure
Hi Tony,
I know it's been a while and am not sure if you already figured out the
issue but try taking at HBASE-9079 and see if it's similar to the problem
that you are facing with FuzzyRowFilter. I have attached a patch to that
ticket too and have verified that it fixed things for me in
Attached are 2 patches: one of them is TestFail.patch where I show that the
behavior is not as expected. On the other hand, the second patch is with
the changes that I did to FilterList and the behavior is as expected.
I have tested the state maintenance on two filters that implement
Attached the two test patches to this JIRA:
https://issues.apache.org/jira/browse/HBASE-9079
On Mon, Jul 29, 2013 at 4:36 PM, Ted Yu yuzhih...@gmail.com wrote:
Can you log a JIRA and attach the patches there ?
Your attachments did not go through.
Hi,
I hit a weird issue/bug and am able to reproduce the error consistently.
The problem arises when FilterList has two filters where each implements
the getNextKeyHint method.
The way the current implementation works is, StoreScanner will call
matcher.getNextKeyHint() whenever it gets a
Thanks guys for going through that never-ending email! I will create the
JIRA for block cache eviction and the regionserver assignment command. Ted
already pointed to the JIRA which tries to go a different datanode if the
primary is busy (I will add comments to that one).
To answer Andrews'
I was able to reproduce the same regionserver asking for the same local
block over 300 times within the same 2 minute window by running one of my
heavy workloads.
Let me try and gather some stack dumps. I agree that jstack crashing the
jvm is concerning but there is nothing in the errors to know
Good question. When I looked at the logs, it's not clear from it whether
it's reading a meta or data block. Is there any kind of log line that
indicates that ? Given that it's saying that it's ready from a startOffset
I would assume this is a data block.
A question that comes to mind, is this
We haven't disable block cache. So I doubt that's the problem.
On Mon, Jul 8, 2013 at 4:50 PM, Varun Sharma va...@pinterest.com wrote:
FYI, if u disable your block cache - you will ask for Index blocks for
every single request. So such a high rate of request is plausible for Index
blocks even
Asaf, the hdfsBlocksLocalityIndex is around 76 and it's 86 for the
regionserver which is under the heaviest load for IO.
Ram, I saw that you updated the JIRA saying the checksum metrics are
available in the regionserver. What group are they published under ? I
checked my ganglia stats and can't
I saw the same code and also saw the following in RegionServerMetrics.java
/**
* Number of times checksum verification failed.
*/
public final MetricsLongValue checksumFailuresCount =
new MetricsLongValue(checksumFailuresCount, registry);
The registry is then registered in JMX via:
//
Yes I was checking 0.94 code.
And sorry for the brain fart, I just spotted the metric in ganglia. There
are just too many metrics in ganglia and skipped this one! It was under the
group hbase.regionserver, while I was expected it to be
hbase.regionserver.RegionServerStatistics.
The chart shows
No worries, Anoop. Here is some clarification for this chain.
It started initially to figure out how to check whether SCR is effective at
the RS or not. I could not find the metric anywhere in ganglia/JMX and
didn't find any RegionServer level metric either and so started looking at
my DN logs. I
Sweet! enabled debug logging for org.apache.hadoop.hdfs.DFSClient and found
the New BlockReaderLocal log line. Got some verification that SCR is ON
and working fine.
Regarding no clienttrace lines in DN, I verified that too. Last time I saw
a few lines since I forgot to remove HDFS_WRITE lines.
Hi,
If I have enabled shortcircuit reads, should I ever be seeing clienttrace
logs in the datanode for the regionserver DFSClient that is co-located with
the datanode ?
Besides that is there any other way to verify that my setting for short
circuit reads is working fine.
Thanks,
Viral
I looked up the ganglia metrics that I have setup for the cluster (both
HBase and HDFS) and don't see it there. Is it not published to ganglia ?
On Wed, Jul 3, 2013 at 11:33 PM, Asaf Mesika asaf.mes...@gmail.com wrote:
I think there is a metric in HBase and HDFS (JMX) reflecting that.
If you
Currently datanode shows a lot of clienttrace logs for DFSClient. I did a
quick command line check to see how many clienttrace do I get per active
RegionServer and it seems the local RegionServer had very few ( 1%).
Given that datanode logs are too noisy with clienttrace, I was hoping to
find the
Created the JIRA at: https://issues.apache.org/jira/browse/HBASE-8868
Sorry if I got a few fields wrong, will learn from this one to open better
JIRAs going forward.
Thanks,
Viral
On Thu, Jul 4, 2013 at 2:02 AM, ramkrishna vasudevan
ramkrishna.s.vasude...@gmail.com wrote:
I think we should
I ended up writing a tool which helps merge the table regions into a target
# of regions. For example if you want to go from N -- N/8, then the tool
figures out the grouping and merges them in one pass. I will put it up in a
github repo soon and share it here.
The sad part of this approach is the
Found this while going through the online merge jira...
https://issues.apache.org/jira/browse/HBASE-8217
The comments were interesting and I as an user would agree to the fact that
supplying a patch is good and it's on me to decide whether I should use it
or not. The core committee obviously is
When you did the scan, did you check what the bottleneck was ? Was it I/O ?
Did you see any GC locks ? How much RAM are you giving to your RS ?
-Viral
On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote:
To completely scan the table for all 140 columns , it takes around 30-40
Hi,
Just wanted to check if it's safe to use the JIRA mentioned in the subject
i.e. https://issues.apache.org/jira/browse/HBASE-7846
Thanks,
Viral
On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain vkj...@gmail.com wrote:
Sorry for the typo .. please ignore previous mail.. Here is the corrected
one..
1)I have around 140 columns for each row , out of 140 , around 100 columns
hold java primitive data type , remaining 40 columns contain
You can use hbase shell and run major_compact 'tablename' or you could run
echo major_compact 'tablename' | hbase shell
On Sun, Jun 30, 2013 at 7:51 PM, ch huang justlo...@gmail.com wrote:
i want clean the data that is deleted ,question is which command i can
execute on commandline?
thanks
On Fri, Jun 28, 2013 at 9:31 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:
On Thu, Jun 27, 2013 at 4:27 PM, Viral Bajaria viral.baja...@gmail.com
wrote:
It's not random, it picks the region with the most data in its memstores.
That's weird, because I see some of my regions which receive
Hi All,
I wanted some help on understanding what's going on with my current setup.
I updated from config to the following settings:
property
namehbase.hregion.max.filesize/name
value107374182400/value
/property
property
namehbase.hregion.memstore.block.multiplier/name
Thanks for the quick response Anoop.
The current memstore reserved (IIRC) would be 0.35 of total heap right ?
The RS total heap is 10231MB, used is at 5000MB. Total number of regions is
217 and there are approx 150 regions with 2 families, ~60 with 1 family and
remaining with 3 families.
How to
Thanks Liang!
Found the logs. I had gone overboard with my grep's and missed the Too
many hlogs line for the regions that I was trying to debug.
A few sample log lines:
2013-06-27 07:42:49,602 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Too many hlogs: logs=33, maxlogs=32; forcing flush
0.94.4 with plans to upgrade to the latest 0.94 release.
On Thu, Jun 27, 2013 at 2:22 AM, Azuryy Yu azury...@gmail.com wrote:
hey Viral,
Which hbase version are you using?
I do have a heavy write operation going on. Actually heavy is relative. Not
all tables/regions are seeing the same amount of writes at the same time.
There is definitely a burst of writes that can happen on some regions. In
addition to that there are some processing jobs which play catch up and
Thanks Azuryy. Look forward to it.
Does DEFERRED_LOG_FLUSH impact the number of WAL files that will be created
? Tried looking around but could not find the details.
On Thu, Jun 27, 2013 at 7:53 AM, Azuryy Yu azury...@gmail.com wrote:
your JVM options arenot enough. I will give you some detail
Hey JD,
Thanks for the clarification. I also came across a previous thread which
sort of talks about a similar problem.
http://mail-archives.apache.org/mod_mbox/hbase-user/201204.mbox/%3ccagptdnfwnrsnqv7n3wgje-ichzpx-cxn1tbchgwrpohgcos...@mail.gmail.com%3E
I guess my problem is also similar to
and then
eventually dropping the table.
-Viral
On Tue, Jun 25, 2013 at 5:20 PM, Viral Bajaria viral.baja...@gmail.comwrote:
Hi JM,
Yeah you are right about when the exception happens. I just went through
all the logs of table creation and don't see an exception. Though there
was a LONG pause when
Hi,
I created a new table on my cluster today and hit a weird issue which I
have not come across before. I wanted to run it by the list and see if
anyone has seen this issue before and if not should I open a JIRA for it.
it's still unclear of why it would happen. I create the
table
fully
created. So it's just normal for HBase to not open it. The issue is on
the creation time. Do you sill have the logs?
Thanks,
JM
2013/6/25 Viral Bajaria viral.baja...@gmail.com:
Hi,
I created a new table on my cluster today and hit a weird issue which I
have not come across before
The shell allows you to use filters just like the standard HBase API but
with jruby syntax. Have you tried that or that is too painful and you want
a simpler tool ?
-Viral
On Tue, May 21, 2013 at 2:58 PM, Aji Janis aji1...@gmail.com wrote:
are there any tools out there that can help in
Thanks for all the help in advance!
Answers inline..
Hi Viral,
some questions:
Are you adding new data or deleting data over time?
Yes I am continuously adding new data. The puts have not slowed down but
that could also be an after effect of deferred log flush.
Do you have bloom
On Fri, May 17, 2013 at 8:23 AM, Jeremy Carroll phobos...@gmail.com wrote:
Look at how much Hard Disk utilization you have (IOPS / Svctm). You may
just be under scaled for the QPS you desire for both read + write load. If
you are performing random gets, you could expect around the low to mid
Hi,
My setup is as follows:
24 regionservers (7GB RAM, 8-core CPU, 5GB heap space)
hbase 0.94.4
5-7 regions per regionserver
I am doing an avg of 4k-5k random gets per regionserver per second and the
performance is acceptable in the beginning. I have also done ~10K gets for
a single regionserver
Have you checked your HBase environment? I think it perhaps come from:
1) System uses more swap frequently when your continue to execute Gets
operation?
I have set swap to 0. AFAIK, that's a recommended practice. Let me know if
that should not be followed for nodes running HBase.
2) check
This generally happens when the same block is accessed for the HFile. Are
you seeing any contention on the HDFS side?
When you say contention what should I be looking for ? slow operations to
respond to data block requests ? or some specific metric in ganglia ?
-Viral
Going from memory, the swap value setting to 0 is a suggestion. You may
still actually swap, but I think its a 'last resort' type of thing.
When you look at top, at the top of the page, how much swap do you see?
When I look at top it says: 0K total, 0K used, 0K free (as expected). I can
try
If you're not swapping then don't worry about it.
My comment was that even though you set the swap to 0, and I'm going from
memory, its possible for some swap to occur.
(But I could be wrong. )
Thanks for sharing this info. Will remember for future debugging too.
Checked the vm.swappiness
Hi,
I have been consistently hitting the following error in one of my QA
clusters. I came across two JIRAs, the first one (HBASE-3466) was closed
saying Cannot Reproduce but a new one was re-opened under HBASE-5285.
I am using HBase 0.94.4 and Hadoop 1.0.4
24 region servers (8 cores, 8GB RAM)
On Sun, May 5, 2013 at 10:45 PM, ramkrishna vasudevan
ramkrishna.s.vasude...@gmail.com wrote:
Just to confirm you are getting this with LruBlockCache? If with
LruBlockCache then the issue is critical.
Because we have faced similar issue with OffHeapCache. But that is not
yet stable as far
On Mon, Apr 29, 2013 at 10:54 PM, Asaf Mesika asaf.mes...@gmail.com wrote:
I think for Pheoenix truly to succeed, it's need HBase to break the JVM
Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
analytics queries utilize memory, thus since its memory is shared with
Thanks for getting back, Ted. I totally understand other priorities and
will wait for some feedback. I am adding some more info to this post to
allow better diagnosing of performance.
I hit my region servers with a lot of GET requests (~20K per second per
regionserver) using asynchbase in my test
I am using asynchbase which does not have the notion of batch gets. It
allows you to batch at a rowkey level in a single get request.
-Viral
On Mon, Apr 29, 2013 at 11:29 PM, Anoop John anoop.hb...@gmail.com wrote:
You are making use of batch Gets? get(List)
-Anoop-
Looked closely into the async API and there is no way to batch GETs to
reduce the # of RPC calls and thus handlers. Will play around tomorrow with
the handlers again and see if I can find anything interesting.
On Tue, Apr 30, 2013 at 12:03 AM, Anoop John anoop.hb...@gmail.com wrote:
If you can
On Sun, Apr 28, 2013 at 7:37 PM, ramkrishna vasudevan
ramkrishna.s.vasude...@gmail.com wrote:
So you mean that when the handler count is more than 5k this happens when
it is lesser this does not. Have you repeated this behaviour?
What i doubt is when you say bouncing around different
On Mon, Apr 29, 2013 at 2:25 AM, Ted Yu yuzhih...@gmail.com wrote:
I noticed the 8 occurrences of 0x703e... following region server name in
the abort message.
I wonder why the repetition ?
Cheers
Oh good observation. I just stepped through the logs again and saw that the
client timeout
On Mon, Apr 29, 2013 at 2:49 AM, Ted Yu yuzhih...@gmail.com wrote:
After each zookeeper reconnect, I saw same session Id (0x703e...)
What version of zookeeper are you using ? Can you search zookeeper log for
this session Id to see what happened ?
Thanks
Zookeeper version is 3.4.5,
which
had less handlers ( 15K), it stopped bouncing around. I am surprised
bumping up handlers and having 0 traffic on the cluster can cause this
issue.
-Viral
On Mon, Apr 29, 2013 at 1:23 PM, Viral Bajaria viral.baja...@gmail.comwrote:
On Mon, Apr 29, 2013 at 2:49 AM, Ted Yu yuzhih...@gmail.com
Hi,
I have been trying to play around with the regionserver handler count. What
I noticed was, the cluster comes up fine up to a certain point, ~7500
regionserver handler counts. But above that the system refuses to start up.
They keep on spinning for a certain point. The ROOT region keeps on
Yu yuzhih...@gmail.com wrote:
bq. the setting is per regionserver (as the name suggests) and not per
region right ?
That is correct.
Can you give us more information about your cluster size, workload, etc ?
Thanks
On Mon, Apr 29, 2013 at 4:30 AM, Viral Bajaria viral.baja...@gmail.com
Phoenix might be able to solve the problem if the keys are structured in
the binary format that it understand or else you are better off reloading
that data in a table created via Phoenix. But I will let James tackle this
question.
Regarding your use-case, why can't you do the aggregation using
+1!
On Fri, Apr 19, 2013 at 4:09 PM, Marcos Luis Ortiz Valmaseda
marcosluis2...@gmail.com wrote:
Wow, great work, Doug.
2013/4/19 Doug Meil doug.m...@explorysmedical.com
Hi folks,
I reorganized the Schema Design case studies 2 weeks ago and consolidated
them into here, plus added
I think this whole idea of don't go over a certain number of column
families was a 2+ year old story. I remember hearing numbers like 5 or 6
(not 3) come up when talking at Hadoop conferences with engineers who were
at companies that were heavy HBase users. I agree with Andrew's suggestion
that we
Are you sure that your hbase regionserver is registered with the external
IP in zookeeper ? Your client (laptop) might be trying to connect to ec2
hbase using the internal host name which will not get resolved. To do a
quick test, just modify the /etc/hosts on your laptop and put both the ec2
Most of the clients listed below are language specific, so if your
benchmarking scripts are written in JAVA, you are better off running the
java client.
HBase Shell is more for running something interactive, not sure how you
plan to benchmark that.
REST is something that you could use, but I can't
How often do you run those jobs ? Do they run periodically or are they
running all the time ?
If you have a predictable periodic behavior, you could disable automatic
compaction and trigger it manually using a cron job (not the recommended
approach, AFAIK). Or you could set the compaction to
Well if you can afford a longer downtime, you can always distcp your
existing hbase data. This way if things get screwed up you can always
restore a 0.90.x on that old backup. You cannot distcp while the cluster is
running since it will not be able to get locks on file (I think I faced
that issue
Cool !!! This is really good. I have a quick question though, is it
possible to use Phoenix over existing tables ? I doubt it but just thought
I will ask it on the list.
On Tue, Feb 26, 2013 at 11:17 AM, Stack st...@duboce.net wrote:
On Tue, Feb 26, 2013 at 10:02 AM, Graeme Wallace
Also the readFields is your implementation of how to read the byte array
transferred from the client. So I think there has to be some issue in how
you write the byte array to the network and what you are reading out of
that i.e. the size of arrays might not be identical.
But as Ted mentioned,
HBase shell is a jruby shell and so you can invoke any java commands from
it.
For example:
import org.apache.hadoop.hbase.util.Bytes
Bytes.toLong(Bytes.toBytes(1000))
Not sure if this works as expected since I don't have a terminal in front
of me but you could try (assuming the SPLITS keyword
I have come across this too, I think someone with authorization needs to
perform a maven release to the apache maven repository and/or maven central.
For now, I just end up compiling the dot release from trunk and deploy it
to my local repository for other projects to use.
Thanks,
Viral
On Tue,
Hi Varun,
Are your gets around sequential keys ? If so, you might benefit by doing
scans with a start and stop. If they are not sequential I don't think there
would be a better way from the way you describe the problem.
Besides that, some of the questions that come to mind:
- How many GET(s) are
Are all these dupe events expected to be within the same hour or they
can happen over multiple hours ?
Viral
From: Rahul Ravindran
Sent: 2/14/2013 11:41 AM
To: user@hbase.apache.org
Subject: Using HBase for Deduping
Hi,
We have events which are delivered into our HDFS cluster which may
be
On Thu, Feb 14, 2013 at 12:29 PM, Rahul Ravindran rahu...@yahoo.com wrote:
Most will be in the same hour. Some will be across 3-6 hours.
Sent from my phone.Excuse the terseness.
On Feb 14, 2013, at 12:19 PM, Viral Bajaria viral.baja...@gmail.com
wrote:
Are all these dupe events expected
is that, doing a lookup per event within the
MR job is going to be bad?
From: Viral Bajaria viral.baja...@gmail.com
To: Rahul Ravindran rahu...@yahoo.com
Cc: user@hbase.apache.org user@hbase.apache.org
Sent: Thursday, February 14, 2013 12:48 PM
Subject: Re: Using
Hi,
I am creating a new table and want to pre-split the regions and am seeing
some weird behavior.
My table is designed as a composite of multiple fixed length byte arrays
separated by a control character (for simplicity sake we can say the
separator is _underscore_). The prefix of this rowkey
I was able to figure it out. I had to use the createTable api which took
splitKeys instead of the startKey, endKey and numPartitions.
If anyone comes across this issue and needs more feedback feel free to ping
me.
Thanks,
Viral
On Thu, Feb 14, 2013 at 7:30 PM, Viral Bajaria viral.baja
Congrats guys !!! This is something that was sorely missing in what I am
trying to build... will definitely try it out... just out of curiosity,
what kind of projects/tools at SalesForce uses this library ?
On Wed, Jan 30, 2013 at 5:55 PM, Huanyou Chang mapba...@mapbased.comwrote:
Great tool,I
When you say indexing, are you referring to indexing the column qualifiers
or the values that you are storing in the qualifier ?
Regarding indexing, I remember someone had recommended this on the mailing
list before: https://github.com/ykulbak/ihbase/wiki but it seems the
development on that is
the source repository ?
-Viral
On Mon, Jan 28, 2013 at 7:43 AM, Vandana Ayyalasomayajula
avand...@yahoo-inc.com wrote:
Hi viral,
Try adding -Psecurity and then compiling.
Thanks
Vandana
Sent from my iPhone
On Jan 28, 2013, at 3:05 AM, Viral Bajaria viral.baja...@gmail.com
wrote:
Hi
and -Dhadoop.profile=23.
That should work.
Thanks
Vandana
On Jan 28, 2013, at 11:48 AM, Viral Bajaria wrote:
Thanks Vandana for reply. I tried that but no luck. It still throws the
same error. I thought there might have been a typo and you meant -D and
not
-P but none of them worked
, 2013 at 5:58 PM, Viral Bajaria viral.baja...@gmail.comwrote:
Tried all of it, I think I will have to defer this to the hadoop mailing
list because it seems there is a missing class in hadoop 0.23 branches, not
sure if that is intentional. The class exists in trunk and hadoop 2.0
branches. Though
restart Hbase cluster?
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=0015175cb4cee635fe04d348eb19
--0015175cb4cee635fe04d348eb19
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Is that one unassigned task even getting assigned or it errored out and
Hi,
Is your dfs.datanode.handler.count set to the default value of 3 ? I think
I bumped it up when I got these exceptions and the issue wasn't due to
xcievers. I would recommend increasing that to 6 and see if the error goes
away or the frequency of the error decreases.
Thanks,
Viral
On Wed,
82 matches
Mail list logo