Hi,
is it possible to count the rows in a table in a specified timerange?
I found [--range=[startKey],[endKey]] but I need to count between
specified timestamps.
Regards Hansi
Hi Ted,
I had read those, but I'm confused about how this will affect non-HBase
HDFS data. With HDFS checksumming off won't it affect data integrity?
Krishna
On 24 April 2014 15:54, Ted Yu yuzhih...@gmail.com wrote:
Please take a look at the following:
HBase using its own checksum handling doesn't directly affect HDFS. It will
still maintain checksum info. The diff is at the read time.. HBase will
open reader with checksum validation false and it will do checksum
validation on its own. So using hbase handled checksum in a cluster
should not
Hi, Hansi
Take look at
https://github.com/apache/hbase/blob/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
.
You can modified it and and scan.setTimeRange(long minStamp, long maxStamp)
that will give you option to count rows in specific time range.
Anther
Thank you for your reply Anoop.
However, the confusing is, unfortunately, still there because of the
following (from
herehttp://hbase.apache.org/book.html#perf.hdfs.configs.localread
):
For optimal performance when short-circuit reads are enabled, it is
recommended that HDFS checksums are
Hi
We have a requirement in which we have to get the scan result sorted on a
particular column.
eg. *Get Details of Authors sorted by their Publication Count. Limit :1000 *
*Row Key is a MD5 hash of Author Id*
Number of records 8.2 million rows for 3 year data.(sample dataset, actual
data set
Have you looked at Apache Phoenix ?
Cheers
On Apr 29, 2014, at 2:13 AM, Vikram Singh Chandel
vikramsinghchan...@gmail.com wrote:
Hi
We have a requirement in which we have to get the scan result sorted on a
particular column.
eg. *Get Details of Authors sorted by their Publication
Yes we have looked, but way back in November December 2013 when it was
having a lot of issue and because of which we decided not to use it. We
built our solution design on Hbase alone. So we are looking for a better
solution.
Thanks
On Tue, Apr 29, 2014 at 5:46 PM, Ted Yu yuzhih...@gmail.com
Hi Ted,
I had read those, but I'm confused about how this will affect non-HBase HDFS
data. With HDFS checksumming off won't it affect data integrity?
Krishna
On 24 April 2014 15:54, Ted Yu
yuzhih...@gmail.commailto:yuzhih...@gmail.com wrote:
Please take a look at the following:
Hi,
I face a strange issue today.
I stopped my cluster to merge some regions. When I tried to restart it, one
server got stuck.
If I try to stop the cluster, this server ignore the request and stay stuck.
Then if I restart, the cluster come back without this server saying that
it's already
Can you give us versions of hbase and zookeeper ?
bq. Connexion ré-initialisée par le correspondant
Google translate says:
Connection re-initialized by the corresponding
Is the above correct ?
On Tue, Apr 29, 2014 at 7:37 AM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
Hi,
I face
HBase 0.94.19. I have the RC1 jars but I guess they are the same as the
release.
ZK 3.4.3
And for the translation, I will say something like Connection reset by
peer? ;)
Seems that ZK close the connection for this specific RS. I'm not really
getting why...
JM
2014-04-29 10:43 GMT-04:00 Ted Yu
Hi all,
sorry for the late answer.
I configured the hbase-site.conf like this
property
namedfs.client.socketcache.capacity/name
value0/value
/property
property
namedfs.datanode.socket.reuse.keepalive/name
value0/value
/property
and restarted the hbase master and all
On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose hansi.kl...@web.de wrote:
Hi all,
sorry for the late answer.
I configured the hbase-site.conf like this
property
namedfs.client.socketcache.capacity/name
value0/value
/property
property
Hi Vikram,
I see you sent the Phoenix mailing list back in Dec a question on how to
use Phoenix 2.1.2 with Hadoop 2 for HBase 0.94. Looks like you were having
trouble building Phoenix with the hadoop2 profile. In our 3.0/4.0 we bundle
the phoenix jars pre-built with both hadoop1 and hadoop2, so
Hi all,
Here what i want to do is i want to firstly need to create and a column
family and then need to create a column for data to be added later. Here i
can not use PUT command as it requires a value parameter(because idea of
that application is first initiate the table and data will be added
When create your table, you specify the column families in the table (code
snippet from TestFromClientSide3.java):
HTableDescriptor htd = newHTableDescriptor(hTable.getTableDescriptor());
HColumnDescriptor hcd = new HColumnDescriptor(htd.getFamily(FAMILY));
htd.addFamily(hcd);
On Tue, Apr 29, 2014 at 1:54 AM, Krishna Rao krishnanj...@gmail.com wrote:
Thank you for your reply Anoop.
However, the confusing is, unfortunately, still there because of the
following (from
herehttp://hbase.apache.org/book.html#perf.hdfs.configs.localread
):
For optimal performance when
Yes this code is correct. Here what the code does is only create a column
family. What i want to do is inside the column family i want to create
several columns. In this documentation
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.htmlfor
the Class HTableDescripter it
HBase does not have the concept of static columns.
You just have groups the column families
and then each row has a set of qualifiers.
so, you don't need to specify the set of columns (qualifiers) on
initialization
you just add/update rows with the specified column
so you end up doing something
On Tue, Apr 29, 2014 at 11:53 AM, Stack st...@duboce.net wrote:
On Tue, Apr 29, 2014 at 1:54 AM, Krishna Rao krishnanj...@gmail.comwrote:
Thank you for your reply Anoop.
However, the confusing is, unfortunately, still there because of the
following (from
thank you for the immediate reply. (Y)
On Wed, Apr 30, 2014 at 12:44 AM, Matteo Bertozzi
theo.berto...@gmail.comwrote:
HBase does not have the concept of static columns.
You just have groups the column families
and then each row has a set of qualifiers.
so, you don't need to specify the
We had this issue again in production. We had to shutdown the region
server. Restart didn't help since this RS was bombarded with write requests
and execute coprocessors requests, which made it open regions in the rate
of 1 region in 2 minutes or so.
Do you think its related to this jira:
Anything in the zk log when the RS connects JMS?
On Tue, Apr 29, 2014 at 7:37 AM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
Hi,
I face a strange issue today.
I stopped my cluster to merge some regions. When I tried to restart it, one
server got stuck.
If I try to stop the
1. 2014-04-29 10:32:28,876 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
This client just lost it's session with ZooKeeper, will automatically
reconnect when needed.
What about ZK metrics, may be it's timing out or something (I mean echo
srvr |nc
Hey all. I have some questions regarding row key and column design.
We want to calculate some metrics based on our page views broken down
by minute, day, month and year. We also want this broken down country
and have the ability to filter by some other attributes such as the
sex of the user or
Hey all. I have some questions regarding row key and column design.
We want to calculate some metrics based on our page views broken down
by hour, day, month and year. We also want this broken down country
and have the ability to filter by some other attributes such as the
sex of the user or
The initial row key design would result in hot spot (w.r.t. writes)
Is user id part of row key ?
Have you looked at Sematext's HBaseWD library ?
Lastly, Apache Phoenix may fit your needs.
On Tue, Apr 29, 2014 at 3:25 PM, Software Dev static.void@gmail.comwrote:
Hey all. I have some
Sorry, I thought I deleted this message as it got cut off halfway when
I was writing it. Can you please look at my other post?
On Tue, Apr 29, 2014 at 3:36 PM, Ted Yu yuzhih...@gmail.com wrote:
The initial row key design would result in hot spot (w.r.t. writes)
Is user id part of row key ?
Someone mentioned in another post about hotspotting. I guess I could
reverse the row keys to prevent this?
On Tue, Apr 29, 2014 at 3:34 PM, Software Dev static.void@gmail.com wrote:
Hey all. I have some questions regarding row key and column design.
We want to calculate some metrics based
I've created a similar system using a rowkey like: (hash of date) - date
The downside is it still has a hotspot when inserting, but when reading a range
of time it does not. My use case was geared towards speeding up lots of reads.
Column qualifiers are just the collection of items you are
Here is some links that helped me design my keys...
http://www.appfirst.com/blog/best-practices-for-managing-hbase-in-a-high-write-environment/
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
The downside is it still has a hotspot when inserting, but when reading a
range of time it does not
How can you do a scan query between dates when you hash the date?
Column qualifiers are just the collection of items you are aggregating on.
Values are increments. In your case qualifiers
You don't do a scan, you do a series of gets, which I believe you can batch
into one call.
last 5 days query in pseudocode
res1 = Get( hash(2014-04-29) + 2014-04-29)
res2 = Get( hash(2014-04-28) + 2014-04-28)
res3 = Get( hash(2014-04-27) + 2014-04-27)
res4 = Get( hash(2014-04-26) + 2014-04-26)
bq. I believe you can batch into one call.
See the following API in HTable for batching Get's :
public Result[] get(ListGet gets) throws IOException {
On Tue, Apr 29, 2014 at 4:43 PM, Rendon, Carlos (KBB) cren...@kbb.comwrote:
You don't do a scan, you do a series of gets, which I believe
Yes. See total_usa vs. total_female_usa above. Basically you have to
pre-store every level of aggregation you care about.
Ok I think this makes sense. Gets a bit hairy when doing say a
shitload of gets thought.. no?
On Tue, Apr 29, 2014 at 4:43 PM, Rendon, Carlos (KBB) cren...@kbb.com wrote:
Gets a bit hairy when doing say a shitload of gets thought.. no?
If you by hairy you mean the code is ugly, it was written for maximal clarity.
I think you'll find a few sensible loops makes it fairly clean.
Otherwise I'm not sure what you mean.
-Original Message-
From: Software Dev
After some time the node finally exit by itself (Logs below) and I have
been able to restart it correctly... The only things I found in the ZK logs
was this:
2014-04-29 11:28:00,026 [myid:3] - WARN [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@209] - Too many connections from /
Hi, All!
I need help!
I run the java client to write 3million data into HBase,
but when wrote almost 1 million, the process was blocked without any exception.
Does anyone know the possible reason? So i can find the solution.
Thanks All!
By the way, the HBase Version is 0.94.6-cdh4.5.0!
Any logs?
Gargabe collection on the server side? Network issue? Swap?
Please share your master and region servers logs so we can provide feedback.
JM
2014-04-29 21:26 GMT-04:00 jingych jing...@neusoft.com:
Hi, All!
I need help!
I run the java client to write 3million data into HBase,
Thanks JM.
But the log is too big, How can I post the log file?
The query from HBase is slower too.
The Network is OK, I'm sure.
Does GC have the log file ? And how to know the swap?
Sorry, I'm rookie.
jingych
From: Jean-Marc Spaggiari
Date: 2014-04-30 09:30
To: user; jingych
Subject:
Look at when your data injection stopped, and look at the logs close to
that date and see if there is exceptions of anything. You can paste part of
those logs into pastebin.com and post the link here.
You might want to run some Performance Evaluation tests against your
cluster to see if it gives
I found the compaction action in the region server log file:
2014-04-29 16:23:25,373 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Finished memstore flush of ~41.8 M/43820608, currentsize=13.9 M/14607128 for
region gspt_jbxx,,1398754604428.d6fd8d39289985adda9a3e048b92a24b. in 2325ms,
this is the piece of code related to your stack trace:
// check periodically to see if a system stop is requested
if (Store.closeCheckInterval 0) {
bytesWritten += kv.getLength();
if (bytesWritten Store.closeCheckInterval) {
Thanks JM!
Master log:
2014-04-29 16:28:21,973 INFO org.apache.hadoop.hbase.master.AssignmentManager:
The master has opened the region
gspt_jbxx,,1398759815278.b94a0838f87658b577a0d94379a9e02d. that was online on
cdh-datanode3,60020,1397641384262
2014-04-29 16:28:22,100 INFO
Nothing against your code. I just meant that if we are doing a scan
say for hourly metrics across a 6 month period we are talking about
4K+ gets. Is that something that can easily be handled?
On Tue, Apr 29, 2014 at 5:08 PM, Rendon, Carlos (KBB) cren...@kbb.com wrote:
Gets a bit hairy when doing
As I said this afternoon:
See the following API in HTable for batching Get's :
public Result[] get(ListGet gets) throws IOException {
Cheers
On Tue, Apr 29, 2014 at 7:45 PM, Software Dev static.void@gmail.comwrote:
Nothing against your code. I just meant that if we are doing a scan
Ok didnt know if the sheer number of gets would be a limiting factor. Thanks
On Tue, Apr 29, 2014 at 7:57 PM, Ted Yu yuzhih...@gmail.com wrote:
As I said this afternoon:
See the following API in HTable for batching Get's :
public Result[] get(ListGet gets) throws IOException {
Cheers
I guess you can pre-split tables manually which avoids hotspotting..
On Tue, Apr 29, 2014 at 8:08 PM, Software Dev static.void@gmail.comwrote:
Any improvements in the row key design?
If i always know we will be querying by country could/should I prefix
the row key with the country to
49 matches
Mail list logo