Hi thanks! But for loading data into hbase, adding hash in rowkey will improve
performance?
Regards,
Rams
On 12-Sep-2012, at 8:38 AM, lars hofhansl lhofha...@yahoo.com wrote:
It depends. If you do not need to perform rangescans along (prefixes of) your
row keys, you can prefix the row key
Thank you Ulrich, that looks like an interesting approach. I think I
will give that a go.
~ Jeroen
2012/9/10 Ulrich Staudinger ustaudin...@activequant.com:
my AQ Master Server might be of interest to you I have an embedded
HBase server in it, it's very very straight forward to use:
I think yes, because it will avoid hotspotting. I think we have a good post
on that topic on Sematext Blob.
Otis
--
Performance Monitoring - http://sematext.com/spm
On Sep 12, 2012 3:08 AM, Ramasubramanian
ramasubramanian.naraya...@gmail.com wrote:
Hi thanks! But for loading data into hbase,
I wouldn't 'prefix' the hash to the key, but actually replace the key with a
hash and store the unhashed key in a column.
But that's a different discussion.
In a nutshell, the problem is that there are a lot of potential use cases where
you want to store data in a sequence dependent fashion.
Hi all,
I'm trying to find the sweet spot for the cache size and batch size Scan()
parameters.
I'm scanning one table using HTable.getScanner() and iterating over the
ResultScanner retrieved.
I did some testing and got the following results:
For scanning *100* rows.
*
Cache
Batch
Total
How much memory do you have?
What's the size of the underlying row?
What does your network look like? 1GBe or 10GBe?
There's more to it, and I think that you'll find that YMMV on what is an
optimum scan size...
HTH
-Mike
On Sep 12, 2012, at 7:57 AM, Amit Sela am...@infolinks.com wrote:
Hi
I allocate 10GB per RegionServer.
An average row size is ~200 Bytes.
The network is 1GB.
It would be great if anyone could elaborate on the difference between Cache
and Batch parameters.
Thanks.
On Wed, Sep 12, 2012 at 4:04 PM, Michael Segel michael_se...@hotmail.comwrote:
How much memory do
Hi there,
See this for info on the block cache in the RegionServer..
http://hbase.apache.org/book.html
9.6.4. Block Cache
Š and see this for batching on the scan parameter...
http://hbase.apache.org/book.html#perf.reading
11.8.1. Scan Caching
On 9/12/12 9:55 AM, Amit Sela
Hi,
I was doing some load testing on my cluster. I am writing to HBase (version
0.92.0) from 20 threads simultaneously. After running the program for some
time, one of my machines got unresponsive. I checked the GC log and found
occurrences of both concurrent mode failure and promotion failed
Try with lesser than 70% occupancy in CMS… for perm failure try increasing the
permgen size.
Another option is to try DoubleBuffer.
./zahoor
On 12-Sep-2012, at 8:39 PM, Amlan Roy amlan@cleartrip.com wrote:
UseConcMarkSweepGC
Ignore the permgen advice. Thought promotion failure as permgen failure.
./zahoor
On 12-Sep-2012, at 8:39 PM, Amlan Roy amlan@cleartrip.com wrote:
UseConcMarkSweepGC
If you use a collision free hashing algorithm you're right. Otherwise you'd KVs
suddenly grouped into rows that weren't part of the same row.
With hash prefixing you can use a fast and simple hashing algorithm, because
you do not need the hash to be unique.
Depends again on various aspects.
We are looking for one more potential talk so please email otis and I (
j...@cloudera.com) directly with any proposals.
Thanks!
Jon.
On Tue, Sep 11, 2012 at 10:45 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Hi,
I don't think this was mentioned on the ML yet, but for those coming
Thanks for digging, Julian.
Looks like we need to support BigDecimal in HbaseObjectWritable
Actually once a test is written for BigDecimalColumnInterpreter, it would
become much easier for anyone to debug this issue.
On Wed, Sep 12, 2012 at 9:27 AM, Julian Wissmann
MD5 should work, SHA-1 while theoretically may have a collision, it hasn't been
found.
Then there's SHA-2...
I don't disagree with your assertion, however... it causes the key to be longer
that it should have to be.
If you insist on doing this... then take the MD5 hash, truncate it to 4
Any help on this one please.
On Tue, Sep 11, 2012 at 11:19 AM, Jothikumar Ekanath kbmku...@gmail.comwrote:
Hi Stack,
Thanks for the reply. I looked at the code and i am having
a very basic confusion on how to use it correctly. The code i wrote
earlier has the following input
Not insisting :)
MD5 and SHA-1 would be reasonable and can be used to replace the key as you say.
- Original Message -
From: Michael Segel michael_se...@hotmail.com
To: user@hbase.apache.org; lars hofhansl lhofha...@yahoo.com
Cc:
Sent: Wednesday, September 12, 2012 9:49 AM
Subject: Re:
On Tue, Sep 11, 2012 at 6:56 AM, Shengjie Min shengjie@gmail.com wrote:
1. if you do a hive query against the row key like select * from
hive_hbase_test where key='blabla', this would utilize the hbase row_key
index which give you very quick nearly real-time response just like hbase
does.
I have captured some logs from what is happening during one of these pauses.
http://pastebin.com/K162Einz
Can someone help me figure out what's actually going on from these logs?
--- My interpretation of the logs ---
As you can see at the start of the logs, my coprocessor for updating
the data
Inline
On Wed, Sep 12, 2012 at 10:40 AM, Tom Brown tombrow...@gmail.com wrote:
I have captured some logs from what is happening during one of these pauses.
http://pastebin.com/K162Einz
Can someone help me figure out what's actually going on from these logs?
--- My interpretation of the
Hi All,
Can someone pls explain me layman term what rowkey and how to get the rowkey(in
case of hash map) to load data faster into hbase.
Regards,
Rams
On 12-Sep-2012, at 10:40 PM, lars hofhansl lhofha...@yahoo.com wrote:
Not insisting :)
MD5 and SHA-1 would be reasonable and can be used
I attempted to write this up here:
http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html
- Original Message -
From: Ramasubramanian ramasubramanian.naraya...@gmail.com
To: user@hbase.apache.org user@hbase.apache.org
Cc: Michael Segel michael_se...@hotmail.com;
Hi Doug,
That is where i took my code initially, not able to
notice anything different from there. I know there is something wrong with
the key in key out in my code, but not able to figure out.
I have given below what i am using, Do you see anything wrong in there?
Cool!
I'm sure I'll find some time to digg into it early next week if nobody else
lusts after it ;-)
Cheers
2012/9/12 Ted Yu yuzhih...@gmail.com
Thanks for digging, Julian.
Looks like we need to support BigDecimal in HbaseObjectWritable
Actually once a test is written for
WAL is just there for recover. Reads will meet the Memstore on their read
path, that's how LSM Trees are working.
On Wed, Sep 12, 2012 at 11:15 PM, Jason Huang jason.hu...@icare.com wrote:
This might be a naive question but I am not able to find a good answer
from searching online.
The
So - I guess at the time of the query we don't know if the data is in
Memstore or in the RegionServer. In order to ensure we get the most
recent version of data, every Hbase Read query will first go to
Memstore and see if the data is there, and then go to RegionServers if
it couldn't find that
I think you misunderstand concept of memstore. That's just the name of
the temporary in-memory storage. Each region has its own memstore, and thus
it's located on the regionserver itself.
On Wed, Sep 12, 2012 at 11:24 PM, Jason Huang jason.hu...@icare.com wrote:
So - I guess at the time of the
I see now.
Thanks for the quick response and clear explanation.
Jason
On Wed, Sep 12, 2012 at 5:28 PM, Adrien Mogenet
adrien.moge...@gmail.com wrote:
I think you misunderstand concept of memstore. That's just the name of
the temporary in-memory storage. Each region has its own memstore, and
For each file; there is a time range. When you scan/search, the file is
skipped if there is no overlap between the file timerange and the timerange
of the query. As there are other parameters as well (row distribution,
compaction effects, cache, bloom filters, ...) it's difficult to know in
It seems like the the internal logic for handling a time range is two
part: First, as you said, each file contains the minimum and maximum
timestamps contained within. This provides a very rough filter for the
data, but if your data is right, the effect can be huge. Second, a
time range acts a
But when resultscanner executes wouldn't it already query the servers for
all the rows matching the startkey? I am tyring to avoid reading all the
blocks from the file system that matches the keys.
On Wed, Sep 12, 2012 at 3:59 PM, Doug Meil doug.m...@explorysmedical.comwrote:
Hi there,
If
No. By default each call to ClientScanner.next(...) incurs an RPC call to the
HBase server, which is why it is important to enable scanner caching (as
opposed to batching) if you expect to scan many rows.
By default scanner caching is set to 1.
From: Mohit
On Wed, Sep 12, 2012 at 4:48 PM, lars hofhansl lhofha...@yahoo.com wrote:
No. By default each call to ClientScanner.next(...) incurs an RPC call to
the HBase server, which is why it is important to enable scanner caching
(as opposed to batching) if you expect to scan many rows.
By default
If we set caching to N, the region server will attempt to scan N rows before
the next() returns.
So if you typically early out of a scan at the client the server will scan on
average N/2 rows too many, which you have to trade off again the number of RPCs
request without caching.
Good numbers
@Tom
I think your guess is correct. When the HFile can not be skipped as the max and
min TS overlap with the given time range, that file will be scanned fully and
certain rows will be filtered out. Those are read from HDFS.
When you do the reseeks many such read can be avoided.. Remember that
Hi,
do you have script in python for rack awareness configuration?
Thanks!
beatls
On Thu, Sep 13, 2012 at 5:52 AM, Tom Brown tombrow...@gmail.com wrote:
When I query HBase, I always include a time range. This has not been a
problem when querying recent data, but it seems to be an issue
hi,
where can i fidn the GC log?
I am a newcomer.
Thanks!
beatls
On Wed, Sep 12, 2012 at 11:09 PM, Amlan Roy amlan@cleartrip.com wrote:
Hi,
I was doing some load testing on my cluster. I am writing to HBase (version
0.92.0) from 20 threads simultaneously. After running the
37 matches
Mail list logo