Re: Warning messages in hbase logs

2013-10-24 Thread Kiru Pakkirisamy
Vimal, In my case, for a single client my queries take less than a second (sub-second performance is  on what we were shooting for). But the same queries when run concurrently gives completely degraded performance. That is the reason I wrote that no-op test case which I have attached in the

filtering rows for column values not null

2013-10-24 Thread Ted
Hi I have a relatively simple situation, As an example I have a table of Users, with first and last name. I set a scan a FilterList and I add a SingleColumnValueFilter, with column qualifier=firstName, CompareOp.EQUAL, and value=bob. The problem is, I'm getting bob as well as anyone with out a

RE: Hbase 0.96 and Hadoop 2.2

2013-10-24 Thread Paul Honig
Thanks, that fixed the problem for me! -Original Message- From: Tianying Chang [mailto:tich...@ebaysf.com] Sent: Wednesday, October 23, 2013 8:35 PM To: user@hbase.apache.org Subject: RE: Hbase 0.96 and Hadoop 2.2 What is your HBase version and Hadoop version? There is a RPC break

Re: How to create HTableInterface in coprocessor?

2013-10-24 Thread yonghu
Ok. I will give a try. regards! Yong On Wed, Oct 23, 2013 at 11:53 PM, Ted Yu yuzhih...@gmail.com wrote: Yong: I have attached the backport to HBASE-9819. If you can patch your build and see if it fixes the problem, that would be great. On Tue, Oct 22, 2013 at 2:58 PM, Ted Yu

Re: How can I insert large image or video into HBase?

2013-10-24 Thread Julian Zhou
Interesting topic about constructing file/block system on top of HBase. Similar with Facebook Haystack or Taobao TFS targeting at small files management? It is great to see if you can opensource on github with some benchmark, Roman~Sounds like how to well utilize and configure HBase data server's

Re: filtering rows for column values not null

2013-10-24 Thread Jean-Marc Spaggiari
Hi Ted, I'm not sure to get you. HBase will not store any cell in the system if there is no content. So if you do a scan, you will get only the cells where the content is not null. There is no need to have any filter here. Can you please detail what you put in the table and what cells you are

Re: filtering rows for column values not null

2013-10-24 Thread Toby Lazar
I think Ted is looking for the SingleColumnValueFilter.setFilterIfMissing() method. Try setting that to true. Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com Mobile: 646-469-5865 *** On Thu, Oct 24, 2013

Re: filtering rows for column values not null

2013-10-24 Thread Jean-Marc Spaggiari
Got it! I looked at the opposite way: How to get only the row where it's null. Toby is correct. JM 2013/10/24 Toby Lazar tla...@capitaltg.com I think Ted is looking for the SingleColumnValueFilter.setFilterIfMissing() method. Try setting that to true. Toby

Re: High Full GC count for Region server

2013-10-24 Thread Jean-Marc Spaggiari
Can you stop HBase and run fsck on Hadoop to see how your HDFS health is? 2013/10/24 Vimal Jain vkj...@gmail.com Hi Ted/Jean, Can you please help here ? On Tue, Oct 22, 2013 at 10:29 PM, Vimal Jain vkj...@gmail.com wrote: Hi Ted, Yes i checked namenode and datanode logs and i found

Re: Optimizing bulk load performance

2013-10-24 Thread Jean-Marc Spaggiari
Hi Harry, Do you have more details on the exact load? Can you run vmstats and see what kind of load it is? Is it user? cpu? wio? I suspect your disks to be the issue. There is 2 things here. First, we don't recommend RAID for the HDFS/HBase disk. The best is to simply mount the disks on 2

Re: Bulkload Problem

2013-10-24 Thread John
sure, I tried it again with hbase.client.retries.number=100 instead of 10 and it worked for me. But I'm not sure if this really solved the problem or if it was just luck. regards 2013/10/21 Ted Yu yuzhih...@gmail.com John: Can you let us know whether the Import succeeded this time ? If

Add Columnsize Filter for Scan Operation

2013-10-24 Thread John
Hi, I'm write currently a HBase Java programm which iterates over every row in a table. I have to modiy some rows if the column size (the amount of columns in this row) is bigger than 25000. Here is my sourcode: http://pastebin.com/njqG6ry6 Is there any way to add a Filter to the scan Operation

Re: Add Columnsize Filter for Scan Operation

2013-10-24 Thread Jean-Marc Spaggiari
Hi John, Sorry it's not going to reply to your question, but if you do a full table scan, you might want to do it with a MapReduce job so it will be way faster. For the filter, you might have to implement your own. I'm not sure there is any filter based on the cell size today :( JM 2013/10/24

HBase and HDFS append()

2013-10-24 Thread Sirianni, Eric
I am aware of the history of HBase and HDFS append and have read the various blog posts and JIRAs. In particular: https://issues.apache.org/jira/browse/HBASE-5676 https://issues.apache.org/jira/browse/HDFS-3120 My understanding is that HBase requires durable sync capabilities of

Re: Optimizing bulk load performance

2013-10-24 Thread Harry Waye
Hi JM I took a snapshot on the initial run, before the changes: https://www.evernote.com/shard/s95/sh/b8e1516d-7c49-43f0-8b5f-d16bbdd3fe13/00d7c6cd6dd9fba92d6f00f90fb54fc1/res/4f0e20a2-1ecb-4085-8bc8-b3263c23afb5/screenshot.png Good timing, disks appear to be exploding (ATA errors) atm thus I'm

hbase.zookeeper.quorum minimum nodes ?

2013-10-24 Thread Rajesh
I have query on hbase.zookeeper.quorum. I have 2 nodes in hadoop cluster and installed hbase on it. Following is configuration: file: hbase-site.xml configuration property namehbase.zookeeper.quorum/name valuehadoop-master/value

Re: HBase and HDFS append()

2013-10-24 Thread Nicolas Liochon
My understanding is that HBase requires durable sync capabilities of HDFS (i.e. hflush() hsync()), but does *not* require file append capabilities. 99.99% true. The remaining 0.01% is an exceptional code path during the data recovery (as a fall back mechanism to ensure that we can start the

Some questions on HLog

2013-10-24 Thread Wukang Lin
Hi all, Recently, i read the source of HBase's HLog, and i got some questions that puzzled me a lot. here there are: 1 why use reflection to init a SequenceFile.Writer in SequenceFileLogWriter? i read HBASE-2312 but still can't catch the point. 2 It seems that hlog use

Re: hbase.zookeeper.quorum minimum nodes ?

2013-10-24 Thread Ted Yu
bq. Opening socket connection to server localhost/127.0.0.1:2181 Is localhost the same machine as hadoop-master ? Can you tell us more about your cluster config ? What version of HBase / zookeeper are you using ? Cheers On Thu, Oct 24, 2013 at 12:11 AM, Rajesh rajeshni...@gmail.com wrote:

Re: Add Columnsize Filter for Scan Operation

2013-10-24 Thread John
@Jean-Marc: Sure, I can do that, but thats a little bit complicated because the the rows has sometimes Millions of Columns and I have to handle them into different batches because otherwise hbase crashs. Maybe I will try it later, but first I want to try the API version. It works okay so far, but

Re: Some questions on HLog

2013-10-24 Thread Wukang Lin
Hi Ted, Thank you for your response. for #1, I have tried to understand that comment in SequenceFileLogWriter, i can't figure out that instead of reflection, why not use the version of SF.createWriter below directly? SequenceFile.Writer createWriter(FileSystem fs,

Re: Some questions on HLog

2013-10-24 Thread Ted Yu
This was due to the fact that when HBASE-2312 was integrated, there were many flavors of hadoop running in production. So the code had to support all the flavors. Cheers On Thu, Oct 24, 2013 at 9:27 AM, Wukang Lin vboylin1...@gmail.com wrote: Hi Ted, Thank you for your response. for #1,

RE: filtering rows for column values not null

2013-10-24 Thread Vladimir Rodionov
This will work ONLY if you add single column to scan. If you scan multiple columns you will need additional filter (reverse SkipFilter) which filter outs all rows (outputs of SingleColumnValueFilter) which do not have 'firstName' column. I do not think HBase provide similar filter but you can

Re: Add Columnsize Filter for Scan Operation

2013-10-24 Thread Jean-Marc Spaggiari
If the MR crash because of the number of columns, then we have an issue that we need to fix ;) Please open a JIRA provide details if you are facing that. Thanks, JM 2013/10/24 John johnnyenglish...@gmail.com @Jean-Marc: Sure, I can do that, but thats a little bit complicated because the the

Re: filtering rows for column values not null

2013-10-24 Thread Jean-Marc Spaggiari
I will have said: scan.addColumn(YOUR_CF, Bytes.toBytes(firstName)); But not sure if it really makes a difference... 2013/10/24 Vladimir Rodionov vrodio...@carrieriq.com If 'firstName' is NULL - it is missing completely from a row. Add explicitly this column to Scan you create:

Re: Add Columnsize Filter for Scan Operation

2013-10-24 Thread Dhaval Shah
Jean, if we don't add setBatch to the scan, MR job does cause HBase to crash due to OOME. We have run into this in the past as well. Basically the problem is - Say I have a region server with 12GB of RAM and a row of size 20GB (an extreme example, in practice, HBase runs out of memory way

Re: Add Columnsize Filter for Scan Operation

2013-10-24 Thread John
I already mentioned that here: https://groups.google.com/forum/#!topic/nosql-databases/ZWyc4zDursg ... . I'm not sure if it is a issue. After setting the batch size everything worked nice for me. Anyway, that was another problem :) If there would be a Filter my current code would work fine with

Re: hbase 0.96 on hadoop 2.2.0

2013-10-24 Thread with.mirth.and.laughter
Solved by adding hadoop-common-2.2.0 to $HBASE_DIR/lib and removing the version that shouldn't be included in the first place. I guess programmers can't be expected to document a working configuration on production releases. -- View this message in context:

Re: Add Columnsize Filter for Scan Operation

2013-10-24 Thread Ted Yu
For streaming responses, there is this JIRA: HBASE-8691 High-Throughput Streaming Scan API On Thu, Oct 24, 2013 at 9:53 AM, Dhaval Shah prince_mithi...@yahoo.co.inwrote: Jean, if we don't add setBatch to the scan, MR job does cause HBase to crash due to OOME. We have run into this in the

hbase 0.96 on hadoop 2.2.0

2013-10-24 Thread with.mirth.and.laughter
Hi All, I've followed the hbase install guide as closely as I possibly can and I have tried a few different variants in my hbase-site.xml file. I'm still having problems getting HMaster to stay up though. Could anyone tell me more about the issue that I'm seeing here: [hadoop@hadoop1 hadoop]$

Re: Add Columnsize Filter for Scan Operation

2013-10-24 Thread Dhaval Shah
Interesting!! Can't wait to see this in action. I am already imagining huge performance gains   Regards, Dhaval From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org user@hbase.apache.org; Dhaval Shah prince_mithi...@yahoo.co.in Sent: Thursday, 24

Re: Some questions on HLog

2013-10-24 Thread Wukang Lin
Thank you, Ted. I got it, :-). I read the comments on HDFS-744 and HBASE-5954, it seems HDFS support true fsync and enable by default since hadoop 2.0.0 alpha, hbase has the ablity to configure how WAL and Hfile use HDFS's fsync until 0.98, so, for our version of HBase(0.94.6+hadoop 2.0.0), we

Re: Optimizing bulk load performance

2013-10-24 Thread Harry Waye
Ok I'm running a load job atm, I've add some possibly incomprehensible coloured lines to the graph: http://goo.gl/cUGCGG This is actually with one fewer nodes due to decommissioning to replace a disk, hence I guess the reason for one squiggly line showing no disk activity. I've included only the

Re: Optimizing bulk load performance

2013-10-24 Thread Jean-Marc Spaggiari
Can you try vmstat 2? 2 is the interval in seconds it will display the disk usage. On the extract here, nothing is running. only 8% is used. (1% disk IO, 6% User, 1% sys) Run it on 2 or 3 different nodes while you are putting the load on the cluster. And take a look at the 4 last numbers and see

Re: How to create HTableInterface in coprocessor?

2013-10-24 Thread Ted Yu
If you encounter the following error, we can address in another issue: public HTableInterface getTable(byte[] tableName, ExecutorService pool) throws IOException { if (managed) { throw new IOException(The connection has to be unmanaged.); } On Thu, Oct 24, 2013 at 3:25

Re: Optimizing bulk load performance

2013-10-24 Thread Harry Waye
So just a short update, I'll read into it a little more tomorrow. This is from three of the nodes: https://gist.github.com/hazzadous/1264af7c674e1b3cf867 The first is the grey guy. Just glancing at it, it looks to fluctuate more than the others. I guess that could suggest that there are some

Re: Optimizing bulk load performance

2013-10-24 Thread Harry Waye
p.s. I guess this is more turning into a general hadoop issue, but I'll keep the discussion here seeing that I have an audience, unless there are objections. On 24 October 2013 22:02, Harry Waye hw...@arachnys.com wrote: So just a short update, I'll read into it a little more tomorrow. This

Re: Optimizing bulk load performance

2013-10-24 Thread Jean-Marc Spaggiari
Your nodes are almost 50% idle... Might be something else. Sound it's not your disks nor your CPU... Maybe to many RCPs? Have you investigate on your network side? netperf might be a good help for you. JM 2013/10/24 Harry Waye hw...@arachnys.com p.s. I guess this is more turning into a

Re: Optimizing bulk load performance

2013-10-24 Thread Harry Waye
Excuse the ignorance, RCP? On 24 October 2013 22:28, Jean-Marc Spaggiari jean-m...@spaggiari.orgwrote: Your nodes are almost 50% idle... Might be something else. Sound it's not your disks nor your CPU... Maybe to many RCPs? Have you investigate on your network side? netperf might be a good

Re: Optimizing bulk load performance

2013-10-24 Thread Ted Yu
I guess Jean meant RPCs. On Thu, Oct 24, 2013 at 2:34 PM, Harry Waye hw...@arachnys.com wrote: Excuse the ignorance, RCP? On 24 October 2013 22:28, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Your nodes are almost 50% idle... Might be something else. Sound it's not your disks

Re: Optimizing bulk load performance

2013-10-24 Thread Jean-Marc Spaggiari
Remote calls to a server. Just forget about it ;) Please verify the network bandwidth between your nodes. 2013/10/24 Harry Waye hw...@arachnys.com Excuse the ignorance, RCP? On 24 October 2013 22:28, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Your nodes are almost 50% idle...

Re: Optimizing bulk load performance

2013-10-24 Thread Harry Waye
Got it! Re. 50% utilisation, I forgot to mention that 6 cores does not include hyper-threading. Foolish I know, but that would explain CPU0 being at 50%. The nodes are as stated in http://www.hetzner.de/en/hosting/produkte_rootserver/ex10 bar the RAID1. On 24 October 2013 22:50, Jean-Marc

RE: Add Columnsize Filter for Scan Operation

2013-10-24 Thread Vladimir Rodionov
Using HBase client API (scanners) for M/R is so oldish :). HFile has well defined format and it is much more efficient to read them directly. Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodio...@carrieriq.com

Re: RE: Add Columnsize Filter for Scan Operation

2013-10-24 Thread Dhaval Shah
Well that depends on your use case ;) There are many nuances/code complexities to keep in mind: - merging results of various HFiles (each region can have.more than one) - merging results of WAL - applying delete markers - how about data which is only in memory of region servers and no where else

[ANNOUNCE] Phoenix v 2.1 released

2013-10-24 Thread James Taylor
The Phoenix team is pleased to announce the immediate availability of Phoenix 2.1 [1]. More than 20 individuals contributed to the release. Here are some of the new features now available: * Secondary Indexing [2] to create and automatically maintain global indexes over your primary table. -

Re: [ANNOUNCE] Phoenix v 2.1 released

2013-10-24 Thread Julian Zhou
Congratulations~ will refresh and have a try today.Best Regards, JulianOn Oct 25, 2013, at 08:24 AM, James Taylor jtay...@salesforce.com wrote:The Phoenix team is pleased to announce the immediate availability of Phoenix 2.1 [1]. More than 20 individuals contributed to the release. Here are some

Re: [ANNOUNCE] Phoenix v 2.1 released

2013-10-24 Thread Ted Yu
From https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing : Is date_col a column from data table ? CREATE INDEX my_index ON my_table (date_col DESC, v1) INCLUDE (v3) SALT_BUCKETS=10, DATA_BLOCK_ENCODING='NONE'; On Thu, Oct 24, 2013 at 5:24 PM, James Taylor

Re: [ANNOUNCE] Phoenix v 2.1 released

2013-10-24 Thread James Taylor
Thanks, Ted. That was a typo which I've corrected. Yes, these are references to columns from your primary table. It should have read like this: CREATE INDEX my_index ON my_table (v2 DESC, v1) INCLUDE (v3) SALT_BUCKETS=10, DATA_BLOCK_ENCODING='NONE'; On Thu, Oct 24, 2013 at 5:40 PM, Ted Yu

Re: filtering rows for column values not null

2013-10-24 Thread Ted
yes! that was exactly what I was looking for. thanks. Ted On 10/24/13, Toby Lazar tla...@capitaltg.com wrote: I think Ted is looking for the SingleColumnValueFilter.setFilterIfMissing() method. Try setting that to true. Toby *** Toby Lazar Capital

Linear Scalability in HBase

2013-10-24 Thread Ramu M S
Hi All, I am running HBase 0.94.6 with 8 region servers and getting throughput of around 15K Read OPS and 20K Write OPS per server through YCSB tests. Table is pre created with 8 regions per region server and it has 120 million records of 700 bytes each. I increased the number of region servers

Re: RE: Add Columnsize Filter for Scan Operation

2013-10-24 Thread lars hofhansl
We need to finish up HBASE-8369 From: Dhaval Shah prince_mithi...@yahoo.co.in To: user@hbase.apache.org user@hbase.apache.org Sent: Thursday, October 24, 2013 4:38 PM Subject: Re: RE: Add Columnsize Filter for Scan Operation Well that depends on your use

Re: Linear Scalability in HBase

2013-10-24 Thread Ted Yu
How many YCSB clients were used in each setting ? Thanks On Oct 24, 2013, at 9:45 PM, Ramu M S ramu.ma...@gmail.com wrote: Hi All, I am running HBase 0.94.6 with 8 region servers and getting throughput of around 15K Read OPS and 20K Write OPS per server through YCSB tests. Table is pre

Re: 'hbase.client.scanner.caching' default value for HBase 0.90.6?

2013-10-24 Thread lars hofhansl
Excellent. :) From: A Laxmi a.lakshmi...@gmail.com To: user@hbase.apache.org user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Wednesday, October 23, 2013 12:43 PM Subject: Re: 'hbase.client.scanner.caching' default value for HBase 0.90.6? Hi

Re: Fwd: High CPU utilization in few Region servers during read

2013-10-24 Thread lars hofhansl
No this is different. All your data is in the memstore still. The memstore is organized as a skip list, nobody has ever tested that with 72gb. 256mb, 512mb, 1gb, sure... 72gb... no way. Same with a 96gb of java heap. Not with Oracle or OpenJDK and an application specifically for such large