Hi Mahesha,
To answer your question *2: Both strings and numbers are being stored as a
byte[].
The value 25 can be serialized as a byte[] in many ways:
1. As a numeric string, by storing the value as [ 50, 53 ], where 50 is the
byte that represents the character '2' and 53 is the byte for
Is it possible to deploy an endpoint coprocessor via HDFS or must I
distribute the jar file to each regionserver individually?
In my testing, it appears the endpoint coprocessors cannot be loaded from
HDFS, though I'm not at all sure I'm doing it right (are delimiters : or
|, when I use hdfs:///
an error
message of what went wrong.
On Mon, Oct 27, 2014 at 2:03 PM, Tom Brown tombrow...@gmail.com wrote:
Is it possible to deploy an endpoint coprocessor via HDFS or must I
distribute the jar file to each regionserver individually?
In my testing, it appears the endpoint coprocessors
the coprocessor initially is
not in use when the coprocessor is actually invoked.
--Tom
On Mon, Oct 27, 2014 at 3:42 PM, Tom Brown tombrow...@gmail.com wrote:
I'm not sure how to tell if it is a region endpoint or a region server
endpoint.
I have not had to explicitly associate the coprocessor
?
On Mon, Oct 27, 2014 at 4:00 PM, Tom Brown tombrow...@gmail.com wrote:
I tried to attach the coprocessor directly to a table, and it is able to
load the coprocessor class. Unfortunately, when I try and use the
coprocessor I get a ClassNotFoundException on one of the supporting
classes
Hello,
I wish to manually specify the hostname that the master uses to address a
particular regionserver. My machines have multiple names (a friendly one,
and an internal VMish one). Both names resolve just fine within the
environment, but only the friendly ones resolve from outside the
, Jun 25, 2014 at 12:09 PM, Tom Brown tombrow...@gmail.com wrote:
Yes, that stack is still there:
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at
org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion
Brown tombrow...@gmail.com wrote:
Could this happen if the master is running too many RPC tasks and can't
keep up? What about if there's too many connections to the server?
--Tom
On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown tombrow...@gmail.com
wrote:
That server is the master
. at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:282)
7. at
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:127)
On Wed, Jun 25, 2014 at 10:09 AM, Tom Brown tombrow...@gmail.com wrote:
Before I was able to acquire
Hello all,
I'm trying to view the master status of a 6 node (0.94.10; hadoop 1.1.2)
cluster but I keep getting a timeout exception.
The rest of the cluster is operating quite normally. From the exception, it
seems like the list tables function (required to display the web UI) is
timing out for
That server is the master and is not a regionserver.
--Tom
On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu yuzhih...@gmail.com wrote:
Have you checked region server log on 10.100.101.221
http://hdpmgr001.pse.movenetworks.com/10.100.101.221:6 ?
Cheers
On Wed, Jun 18, 2014 at 10:19 AM, Tom
Could this happen if the master is running too many RPC tasks and can't
keep up? What about if there's too many connections to the server?
--Tom
On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown tombrow...@gmail.com wrote:
That server is the master and is not a regionserver.
--Tom
On Wed, Jun
I don't mean to hijack the thread, but this question seems relevant:
Does data block encoding also help performance, or does it just enable more
efficient compression?
--Tom
On Saturday, June 14, 2014, Guillermo Ortiz konstt2...@gmail.com wrote:
I would like to see the times they got doing
Last night a regionserver in my cluster stopped responding in a timely
manner for about 20 minutes. I know that stop-the-world GC can cause this
type of behavior, but 20 minutes seems excessive.
The server is a 2 core VM with 16GB of RAM, (hbase max heap is 12GB). We
are using the latest java 7
We are still using 0.94.10. We are looking at upgrading soon, but have not
done so yet.
--Tom
On Tue, Jun 10, 2014 at 12:10 PM, Ted Yu yuzhih...@gmail.com wrote:
Which release are you using ?
In 0.98+, there is JvmPauseMonitor.
Cheers
On Tue, Jun 10, 2014 at 11:05 AM, Tom Brown tombrow
Carrier IQ, www.carrieriq.com
e-mail: vrodio...@carrieriq.com
From: Tom Brown [tombrow...@gmail.com]
Sent: Tuesday, June 10, 2014 11:13 AM
To: user@hbase.apache.org
Subject: Re: Is this a long GC pause, or something else?
We are still using 0.94.10. We
Subject: Re: Is this a long GC pause, or something else?
Does it repeat?
We are seeing this with u60 oracle JVM too! SPM shows the whole JVM
blocking for about 16 minutes every M minutes.
Otis
On Jun 10, 2014, at 2:05 PM, Tom Brown tombrow...@gmail.com wrote:
Last night
Can you check your server logs for a full stack trace? This sounds like it
could be similar to this:
On Tue, May 27, 2014 at 10:15 AM, Ted Yu yuzhih...@gmail.com wrote:
Can you confirm the version of HBase ?
To my knowledge, cdh5 is based on 0.96
Cheers
On Tue, May 27, 2014 at 1:36 AM,
Sorry, accidentally hit send... I meant to suggest this:
http://stackoverflow.com/questions/20257356/hbase-client-scan-could-not-initialize-org-apache-hadoop-hbase-util-classes/
--Tom
On Tue, May 27, 2014 at 11:14 AM, Tom Brown tombrow...@gmail.com wrote:
Can you check your server logs
I believe each cell stores its own copy of the entire row key, column
qualifier, and timestamp. Could that account for the increase in size?
--Tom
On Mon, Jan 27, 2014 at 3:12 PM, Nick Xie nick.xie.had...@gmail.com wrote:
I'm importing a set of data into HBase. The CSV file contains 82
them...) Is
that
real
In this case, should we do some combination to reduce the overhead?
Thanks,
Nick
On Mon, Jan 27, 2014 at 2:33 PM, Tom Brown tombrow...@gmail.com wrote:
I believe each cell stores its own copy of the entire row key, column
qualifier
The trade-off we make is to increase our write performance knowing it will
negatively impact our read performance. In our case, however, we write a
lot of rows that might never be read (depending on the specific deep-dive
queries that will be run), so it's an ok trade-off. However, our layout is
We have solved this by prefixing each key with a single byte. The byte is
based on a very simple 8-bit hash of the record. If you know exactly which
row you are looking for you can rehash your row to create the true key.
Scans are a little more complex because you have to issue 256 scans instead
where we will store the increments. If
this
doesn't work, we are going to simply pull the increments out of the RO
and
do them in the application or in Flume.
@Tom Brown
I would be very interested to hear more about your solution of
aggregating the increments in another system
We used to do our updates through coprocessors, but it was not scalable. We
extracted the update code into a separate system, added row transaction
IDs, and haven't looked back.
For each incoming message, we compute the set of updates that message will
generate. With a batch of messages, we merge
To update this thread, this was caused by a bug: HBASE-9648.
--Tom
On Sat, Sep 21, 2013 at 9:49 AM, Tom Brown tombrow...@gmail.com wrote:
I am still receiving thousands these log messages for the same region
withing a very short time frame. I have read the compaction documentation,
but have
:
It would help if you can show your RS log (via pastebin?) . Are there
frequent flushes for this region too?
On Tue, Sep 24, 2013 at 9:20 PM, Tom Brown tombrow...@gmail.com wrote:
I have a region that is very small, only 5MB. Despite it's size, it has
24
store files. The logs show that it's
? Pseudo-Dist?
Fully-Dist?
Thanks,
JM
2013/9/24 Tom Brown tombrow...@gmail.com
There is one column family, d. Each row has about 10 columns, and each
row's total data size is less than 2K.
Here is a small snippet of logs from the region server:
http://pastebin.com/S2jE4ZAx
--Tom
for a compaction. Now, we need to find why...
JM
2013/9/24 Tom Brown tombrow...@gmail.com
My cluster is fully distributed (2 regionserver nodes).
Here is a snippet of log entries that may explain why it started:
http://pastebin.com/wQECif8k. I had to go back 2 days to find when it
started
to be fine.
-1 is the default value for TimeRangeTracker.maximumTimestamp.
Can you run:
hadoop fs -lsr hdfs://
hdpmgr001.pse.movenetworks.com:8020/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/
Thanks,
JM
2013/9/24 Tom Brown tombrow...@gmail.com
1. Hadoop version is 1.1.2.
2
Same thing in pastebin: http://pastebin.com/tApr5CDX
On Tue, Sep 24, 2013 at 11:18 AM, Tom Brown tombrow...@gmail.com wrote:
-rw--- 1 hadoop supergroup 2194 2013-09-21 14:32
/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1
-rw--- 1 hadoop
.
Creating a table for that right now, will keep you posted very
shortly.
JM
2013/9/24 Tom Brown tombrow...@gmail.com
-rw--- 1 hadoop supergroup 2194 2013-09-21 14:32
/hbase/compound3/5ab5fdfcf2aff2633e1d6d5089c96aa2/d/014ead47a9484d67b55205be16802ff1
was looking for ;)
getFirstKey for this file seems to return null. So it might simply be an
empty file, not necessary a corrupted one.
2013/9/24 Tom Brown tombrow...@gmail.com
/usr/lib/hbase/bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -m -s -v
-f
/hbase/compound3
here if I'm able to reproduce. Have you tried Sergey's
workarround?
JM
2013/9/24 Tom Brown tombrow...@gmail.com
Yes, it is empty.
13/09/24 13:03:03 INFO hfile.CacheConfig: Allocating LruBlockCache with
maximum size 2.9g
13/09/24 13:03:03 ERROR metrics.SchemaMetrics: Inconsistent
have included a recent example, hot off the
server (see below)
Thanks,
Tom Brown
This particular region (c5f15027ae1d4aa1d5b6046aea6f63a4) is about 800MB,
comprised of 25 store files. Given that, I could reasonably expect up to 25
messages for the region. However, there were at least 388 (I didn't
hbase.client.retries.number
and hbase.client.pause are explained ?
Cheers
On Tue, Sep 17, 2013 at 10:34 AM, Tom Brown tombrow...@gmail.com wrote:
I have a region-server coprocessor that scans it's portion of a table
based
on a request and summarizes the results (designed this way
I have a region-server coprocessor that scans it's portion of a table based
on a request and summarizes the results (designed this way to reduce
network data transfer).
In certain circumstances, the HBase cluster gets a bit overloaded, and a
query will take too long. In that instance, the HBase
Is it normal to receive 3-5 distinct Compaction Complete statuses for the
same region each second? For any individual region, it continuously
generates Compacting d in {theregion}... Compaction Complete statuses for
minutes or hours.
In that status message, what is d?
--Tom
On Wed, Sep 4, 2013
No, just one column family (called d, not surprisingly).
--Tom
On Wed, Sep 4, 2013 at 9:54 AM, Jimmy Xiang jxi...@cloudera.com wrote:
Here d should be the column family being compacted.
Do you have 3-5 column families of the same region being compacted?
On Wed, Sep 4, 2013 at 8:36 AM, Tom
If you're doing comparisons to remove duplicates, I'm not sure if you'd get
any benefit to doing the de-duplication at compaction time.
If you de-duplicate at write time, the same number of comparisons would
have to be made. There will be fewer disk writes (no duplicate data is
written) but
Chris,
I really appreciate your detailed fix description! I've run into
similar problems (due to old hardware and bad sectors) and could never
figure out how to fix a broken table. Hbck always seemed to just make
things worse until I would give up and recreate the table.
Can you publish your
references to attempt to detect when
it has been leaked-- but that mechanism does not appear to be working
in this case.
--Tom
On Fri, Sep 21, 2012 at 11:45 PM, Stack st...@duboce.net wrote:
On Fri, Sep 21, 2012 at 9:02 AM, Tom Brown tombrow...@gmail.com wrote:
Hi all,
I was having some odd
Hi all,
I was having some odd server pauses that appeared to be related to my
usage of a coprocessor endpoint. To help me monitor these, I attempted
to use the task monitor; Now I've got a memory leak and I suspect it's
because I'm not correctly marking each monitored task as completed
(YourKit
I have a similar situation. I have certain keys such that if I didn't have
the timestamps as part of the key I would have to have hundreds and even
thousands of duplicates.
However, I would recommend making sure a the timestamps portion is fixed
width (it will guarantee that your keys for a
If there are 9k possible entries in the lookup table, in order to achieve
space savings, the keys will need to be 1 or 2 bytes. For simplicity, let's
say you go with the 2 byte version. For 30 billion cells you will save 2
bytes per cell at best (from 4 bytes to 2) for a total savings of 60Gb and
with exceptions (ClosedChannelException).
Eventually the exceptions are being thrown from openScanner, which
really doesn't sound good to me.
--Tom
On Mon, Sep 10, 2012 at 11:32 AM, Tom Brown tombrow...@gmail.com wrote:
Hi,
We have our system setup such that all interaction is done through
co
have the rowkey, the time range is less
interesting as you will skip a lot of file already.
On Wed, Sep 12, 2012 at 11:52 PM, Tom Brown tombrow...@gmail.com wrote:
When I query HBase, I always include a time range. This has not been a
problem when querying recent data, but it seems
Hi,
We have our system setup such that all interaction is done through
co-processors. We update the database via a co-processor (it has the
appropriate logic for dealing with concurrent access to rows), and we
also query/aggregate via co-processor (since we don't want to send all
the data over
aggregations.
I'm interested in improving the design, so any suggestions will be appreciated.
Thanks in advance,
--Tom
On Mon, Sep 10, 2012 at 12:45 PM, Michael Segel
michael_se...@hotmail.com wrote:
On Sep 10, 2012, at 12:32 PM, Tom Brown tombrow...@gmail.com wrote:
We have our system setup
I suspect the hashed data will have a more uniform distribution among
all possible ranges, whereas structured data will likely fall into
ranges according to a bell curve (even though it has the possibility
of being in the full range, it usually won't be).
However, if your structured data really
We do numerical sorting within some of our tables. We put the numerical
values as fixed length byte arrays within the keys (and flipped the sign
bit so negative values are lexigraphically lower than positive values)
Of course, it's still part of the key so that technique doesn't work for
(with the same TS) are
skipped.
My point remains, though: Do not rely on this.
(Though it will probably stay the way it is, because that is the most
efficient way to handle this in forward only scanners.)
-- Lars
From: Tom Brown tombrow...@gmail.com
I thought when multiple values with the same key, family, qualifier and
timestamps were written, the one that was written latest (as determined by
position in the store) would be read. Is that not the case?
--Tom
On Saturday, August 25, 2012, lars hofhansl lhofha...@yahoo.com wrote:
The prefix
I have a custom co-processor endpoint that handles aggregation of
various statistics for each region (the stats from all regions are
then merged together for the final result). Sometimes the amount of
data to aggregate is very large, and it takes longer than the exec
timeout to completely
I think you could do it manually by looking up all the different
regions and starting a separate scan for each region. Not quite as
handy as the built-in multi get, but essentially the same.
Of course, that leaves the question of processing-- If you're
processing it in a single-threaded
Somebody will correct me if I'm wrong, but I think that for your
example, you should use setTimeRange(0, 5) and setMaxVersion(1). It's
my understanding that those settings will give you the 1 latest
version from all applicable version (0 = timestamp = 5).
Since it's pretty easy to set the
Can it notice the node is down sooner? If that node is serving an active
region (or if it's a datanode for an active region), that would be a
potentially large amount of downtime. With comodity hardware, and a large
enough cluster, there will always be a machine or two being rebuilt...
Thanks!
Is there any way to control introduce a different ordering scheme from
the base comparable bytes? My use case is that I am using UTF-8 data
for my keys, and I would like to have scans use UTF-8 collation.
Could this be done by providing an alternate implementation of
points to not allow this.
Thanks anyway!
--Tom
On Fri, Jun 8, 2012 at 11:14 AM, Stack st...@duboce.net wrote:
On Fri, Jun 8, 2012 at 9:35 AM, Tom Brown tombrow...@gmail.com wrote:
Is there any way to control introduce a different ordering scheme from
the base comparable bytes? My use case
I have read all I could find regarding what happens when you have
multiple cells with the exact same address (r/f/q/t), and I'm still a
little confused about the resolution.
If I create 2 puts for the exact same address (r/f/q/t), the last one wins?
Can I get different results from a scan as
? What
are your access patterns?
-Amandeep
On Friday, June 1, 2012 at 3:59 PM, Tom Brown wrote:
I have a table that holds rotating data. It has a TTL of 3600. For
some reason, when I scan the table I still get old cells that are much
older than that TTL.
I have tried issuing
I have a table that holds rotating data. It has a TTL of 3600. For
some reason, when I scan the table I still get old cells that are much
older than that TTL.
I have tried issuing a compaction request via the web UI, but that
didn't seem to do anything.
Am I misunderstanding the data model used
, 2012 at 8:30 PM, Tom Brown
tombrow...@gmail.comjavascript:;
wrote:
I don't think you can include a delete with a put and keep it atomic.
You could include a null version of the column with your put, though,
for a similar effect.
--Tom
On Tue, May 22, 2012 at 10:55
I don't think you can include a delete with a put and keep it atomic.
You could include a null version of the column with your put, though,
for a similar effect.
--Tom
On Tue, May 22, 2012 at 10:55 AM, Kristoffer Sjögren sto...@gmail.com wrote:
Hi
I'm trying to use Put operations to replace
Micheal,
This is good info. I wish you'd post what the more is, though.
--Tom
On Mon, May 21, 2012 at 4:30 PM, Michael Segel
michael_se...@hotmail.com wrote:
Hi,
Seems we just had someone talk about this just the other day...
1) 8GB of memory isn't enough to run both M/R and HBase.
Ok,
I know that regions can split (either manually, or automatically), but
is there any process whereby regions that have previously split will
combine (perhaps when one region shrunk)?
If so, what are the conditions that cause it, and does it happen
automatically or only via a manual process?
I made a very similar mistake myself the other day when trying to reset my
cluster. What finally solved it was deleting the temp directory used by my
data nodes (in my case I wanted to loose all my data, so it was ok to
delete everything... In your case, you may have to figure out how to export
For our solution we are doing some aggregation on the server via
coprocessors. In general, for each row there are 8 columns: 7 columns
that contain numbers (for summation) and 1 column that contains a
hyperloglog counter (about 700bytes). Functionally, this solution
works well and ought to scale
All,
I'm writing an OLAP cube database and I can implement the storage in
one of two schemas, and I don't know if there's any unexpected
performance trade-offs I'm not aware of.
Each row represents a unique cell in the cube, with about 5 columns
for each row. The row key format is a set of
Hein (via
Tom White)
- Original Message -
From: Tom Brown tombrow...@gmail.com
To: user@hbase.apache.org; Andrew Purtell apurt...@apache.org
Cc:
Sent: Tuesday, April 10, 2012 3:53 PM
Subject: Re: Add client complexity or use a coprocessor?
Andy,
I have attempted to use
worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
- Original Message -
From: Tom Brown tombrow...@gmail.com
To: user@hbase.apache.org; Andrew Purtell apurt...@apache.org
Cc:
Sent: Tuesday, April 10, 2012 3:53 PM
Subject: Re: Add client complexity
mostly there. If you wanted to be fancy you could
actually maintain the bloom as a bunch of separate colums to avoid update
contention.
On Apr 9, 2012 10:14 PM, Tom Brown tombrow...@gmail.com wrote:
Andy,
I am a big fan of the Increment class. Unfortunately, I'm not doing
simple increments
, some sort of atomic bitfield.
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via
Tom White)
- Original Message -
From: Tom Brown tombrow...@gmail.com
To: user@hbase.apache.org
Cc:
Sent: Monday, April 9, 2012 10:14 PM
White)
- Original Message -
From: Tom Brown tombrow...@gmail.com
To: user@hbase.apache.org
Cc:
Sent: Monday, April 9, 2012 9:48 AM
Subject: Add client complexity or use a coprocessor?
To whom it may concern,
Ignoring the complexities of gathering the data, assume that I
74 matches
Mail list logo