Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
The table has hashed keys so rows are evenly distributed amongst the regionservers, and load on each regionserver is pretty much the same. I also have per-table balancing turned on. I get mostly data local mappers with only a few rack local (maybe 10 of the 250 mappers). Currently the table is

HBase cluster replication firewall rules

2013-05-01 Thread Levy Meny
Hi, We are using HBase replication (over Apache 0.94.2) between two sites and we need to define firewall rules between the two sites. Can anyone provide some information regarding the ports that are used between the sites? Our understand is: Replication is from site1 to site2. *

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread lars hofhansl
If you can, try 0.94.4+; it should significantly reduce the amount of bytes copied around in RAM during scanning, especially if you have wide rows and/or large key portions. That in turns makes scans scale better across cores, since RAM is shared resource between cores (much like disk). It's

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Matt Corgan
Not that it's a long-term solution, but try major-compacting before running the benchmark. If the LSM tree is CPU bound in merging HFiles/KeyValues through the PriorityQueue, then reducing to a single file per region should help. The merging of HFiles during a scan is not heavily optimized yet.

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Naidu MS
Hi i have two questions regarding hdfs and jps utility I am new to Hadoop and started leraning hadoop from the past week 1.when ever i start start-all.sh and jps in console it showing the processes started *naidu@naidu:~/work/hadoop-1.0.4/bin$ jps* *22283 NameNode* *23516 TaskTracker* *26711

Re: Read access pattern

2013-05-01 Thread Naidu MS
Hi i have two questions regarding hdfs and jps utility I am new to Hadoop and started leraning hadoop from the past week 1.when ever i start start-all.sh and jps in console it showing the processes started *naidu@naidu:~/work/hadoop-1.0.4/bin$ jps* *22283 NameNode* *23516 TaskTracker* *26711

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread ramkrishna vasudevan
This happens when your java process is running in debug mode and suspend='Y' option is selected. Regards Ram On Wed, May 1, 2013 at 12:55 PM, Naidu MS sanyasinaidu.malla...@gmail.comwrote: Hi i have two questions regarding hdfs and jps utility I am new to Hadoop and started leraning hadoop

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread ramkrishna vasudevan
Sorry. I think someone hijacked this thread and I replied to this. Naidu, Request you to post a new thread if you have queries and do not hijack the thread. Regards Ram On Wed, May 1, 2013 at 12:57 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: This happens when your java

Re: Scala and Hbase, hbase-default.xml file seems to be for and old version of HBase (null)

2013-05-01 Thread Håvard Wahl Kongsgård
yes, true according to the docs. however, there still something strange with the classpath import org.apache.hadoop.hbase.HBaseConfiguration import org.apache.hadoop.hbase.client.{HBaseAdmin,HTable,Put,Get} import org.apache.hadoop.hbase.util.Bytes val conf = new HBaseConfiguration() val admin

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Jean-Marc Spaggiari
@Lars, how have your calculated the 35K/row size? I'm not able to find the same number. @Bryan, Matt's idea below is good. With the hadoop test you always had data locality. Which your HBase test, maybe not. Can you take a look at the JMX console and tell us your locality % ? Also, over those 45

Re: Read access pattern

2013-05-01 Thread Michael Segel
Unfortunately as this idea keeps popping up, you are going to have this discussion. 1) As you admit... salting is bad when your primary access vector is get()s. 2) Range scans. Instead of 1 range scan, you now have N where N is the number of salt values. In this case 10. You wouldn't think

Re: Read access pattern

2013-05-01 Thread Shahab Yunus
I see what you are saying Michael but I think following is a blanket assumption: bq Think of it this way... the operation was a success but the patient died. eq This is not always the case. Yes, if your use-case/system is such that it will have lots of users trying to access then perhaps N users

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Michael Segel
I'd say go to Avro over protobufs in terms of redesigning your schema. With respect to CPUs, you don't say what your system looks like. Intel vs AMD , Num physical cores, what else you're running on the machine (#Mappers/Reducer slots) etc ... In terms of the schema... How are you

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
Yes I would like to try this, if you can point me to the pom.xml patch that would save me some time. On Tuesday, April 30, 2013, lars hofhansl wrote: If you can, try 0.94.4+; it should significantly reduce the amount of bytes copied around in RAM during scanning, especially if you have wide

Re: Scala and Hbase, hbase-default.xml file seems to be for and old version of HBase (null)

2013-05-01 Thread Michael Segel
What about deflating the jar, to get the file and to put it manually on the classpath? At least it will help in terms of debugging the underlying problem. On May 1, 2013, at 3:24 AM, Håvard Wahl Kongsgård haavard.kongsga...@gmail.com wrote: yes, true according to the docs. however,

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
Yes I have monitored GC, CPU, disk and network IO, anything else I could think of. Only the CPU usage by the regionserver is on the high side. I mentioned data local jobs make up generally 240 of the 250 mappers (96%) - I get this information from the jobtracker. Does the JMX console give more

Advice on setting default HBase table mapping attributes within Apache Gora

2013-05-01 Thread Lewis John Mcgibbney
Hi, Currently in Gora, we support the following table attributes, which we specify when mapping data into HBase; compression, blockCache, blockSize, bloomFilter, maxVersions, timeToLive, inMemory . These expand to the following HColumnDescriptor columnDescriptor = getOrCreateFamily(familyName,

Re: Advice on setting default HBase table mapping attributes within Apache Gora

2013-05-01 Thread Ted Yu
What version of HBase are you using ? Assuming it is 0.94.x, you can find the default values in src/main/resources/hbase-default.xml e.g. property namehfile.block.cache.size/name value0.25/value description Percentage of maximum heap (-Xmx setting) to allocate to block cache

Re: Advice on setting default HBase table mapping attributes within Apache Gora

2013-05-01 Thread Lewis John Mcgibbney
Hi Ted, Thank you for reply. This is where I drop a bomb... which I reservedly apologize for, I should have dropped in original email. We currently pull 0.90.4 maven artifact within Gora trunk! We plan to upgrade to 0.94.X [0] after our next release (next few weeks) Thanks Ted [0]

Re: Advice on setting default HBase table mapping attributes within Apache Gora

2013-05-01 Thread Ted Yu
0.90.x code base is no longer actively maintained. Looking forward to the upgrade of HBase in Gora. On Wed, May 1, 2013 at 11:49 AM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Ted, Thank you for reply. This is where I drop a bomb... which I reservedly apologize for, I should

Re: Advice on setting default HBase table mapping attributes within Apache Gora

2013-05-01 Thread alxsss
Hi, As far as I remember, there were attempts to add filtering on hbase side to nutch-2.x commands, which could use SingleColumnValue filters that are available in hbase-0.95. So, I think it is advisable to upgrade hbase in gora to this version. Thanks. Alex. -Original

Re: Advice on setting default HBase table mapping attributes within Apache Gora

2013-05-01 Thread Lewis John Mcgibbney
Thanks to both of you. We've had a struggle with lots of other stuff. All of this HBase related stuff will be addressed in the next development drive. On Wed, May 1, 2013 at 12:03 PM, alx...@aim.com wrote: Hi, As far as I remember, there were attempts to add filtering on hbase side to

Re: Advice on setting default HBase table mapping attributes within Apache Gora

2013-05-01 Thread Ted Yu
The following filters are in 0.94 code base as well: ./src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java ./src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java On Wed, May 1, 2013 at 12:03 PM, alx...@aim.com wrote: Hi, As far as I remember,

Re: HBase is not running.

2013-05-01 Thread Yves S. Garret
Ok, I spoke too soon. I tried going from Nutch to Hive... not supported (but adding support to it sounds like a fun side-project :) ). But, I can go to HBase. Jean-Marc, I do have a question for you. When you said that I should get the UI going before anything else, what did you mean? I'm

Re: HBase is not running.

2013-05-01 Thread Jean-Marc Spaggiari
Hi Yves, Nice to see you back ;) The UI is http://192.168.x.x:60010/master-status If you don't have the master UI working, there is no need to try the shell, it will not work. JM 2013/5/1 Yves S. Garret yoursurrogate...@gmail.com Ok, I spoke too soon. I tried going from Nutch to Hive...

Re: HBase is not running.

2013-05-01 Thread Mohammad Tariq
Hello Yves, I think by that JM means that you should first make sure that all you Hbase daemons are running fine. The webUI is a pretty convenient tool which allows you to monitor everything in a simpler way. If you are able to see the webUI properly, it means everything is in place and

Re: HBase is not running.

2013-05-01 Thread shashwat shriparv
Commment out everything in /etc/host file add 127.0.0.1 localhost and then try. *Thanks Regards* ∞ Shashwat Shriparv On Thu, May 2, 2013 at 1:57 AM, Mohammad Tariq donta...@gmail.com wrote: Hello Yves, I think by that JM means that you should first make sure that all you

JVM seg fault in HBase region server

2013-05-01 Thread Varun Sharma
Hi, I am seeing the following which is a JVM segfault: hbase-regionser[28734]: segfault at 8 ip 7f269bcc307e sp 7fff50f7e638 error 4 in libc-2.15.so[7f269bc51000+1b5000] Benoit Tsuna reported a similar issue a while back -

Re: JVM seg fault in HBase region server

2013-05-01 Thread Varun Sharma
From what I see from ldd --version ldd (Ubuntu EGLIBC 2.15-0ubuntu10.3) 2.15 We are running eglibc which is somewhat different from glibc - http://en.wikipedia.org/wiki/Embedded_GLIBC. It seems that this is a problem with Ubuntu, have folks seen this on non ubuntu installs ? Thanks Varun On

Re: Coprocessors

2013-05-01 Thread James Taylor
Sudarshan, Below are the results that Mujtaba put together. He put together two version of your schema: one with the ATTRIBID as part of the row key and one with it as a key value. He also benchmarked the query time both when all of the data was in the cache versus when all of the data was read

Re: HBase is not running.

2013-05-01 Thread Yves S. Garret
Hi Jean-Marc, I'll go back through this tutorial once more. http://hbase.apache.org/book/quickstart.html On Wed, May 1, 2013 at 4:27 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Yves, Nice to see you back ;) The UI is http://192.168.x.x:60010/master-status If you don't

Re: HBase is not running.

2013-05-01 Thread Yves S. Garret
Done. I'll go through the previously mentioned tutorial with this in mind. Thank you for your help. On Wed, May 1, 2013 at 4:39 PM, shashwat shriparv dwivedishash...@gmail.com wrote: Commment out everything in /etc/host file add 127.0.0.1 localhost and then try. *Thanks Regards*

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
I tried running my test with 0.94.4, unfortunately performance was about the same. I'm planning on profiling the regionserver and trying some other things tonight and tomorrow and will report back. On May 1, 2013, at 8:00 AM, Bryan Keller brya...@gmail.com wrote: Yes I would like to try this,

Re: HBase is not running.

2013-05-01 Thread Yves S. Garret
Hi guys, one more question. I'm looking at this http://hbase.apache.org/book/quickstart.html link in section 1.2.1 and where I have to modify conf/hbase-site.xml, for parts hbase.rootdir and hbase.zookeper.property.dataDir, to what should I set hbase.rootdir? At the moment, I have hbase.rootdir

Re: HBase is not running.

2013-05-01 Thread Yves S. Garret
One more little update. I ran this command in HBASE_HOME [ $ bin/start-hbase.sh ], using these configuration settings: http://bin.cakephp.org/view/1134614486 After I rand it, I checked $HBASE_HOME/logs/hbase-ysg-master-ysg.connect.log and this is what I saw: http://bin.cakephp.org/view/823736802

Re: HBase is not running.

2013-05-01 Thread Yves S. Garret
Forgot to add, out of the 3 files in logs: -rw-rw-r--. 1 ysg ysg 26439 May 1 22:04 hbase-ysg-master-ysg.connect.log -rw-rw-r--. 1 ysg ysg 0 May 1 22:04 hbase-ysg-master-ysg.connect.out -rw-rw-r--. 1 ysg ysg 0 May 1 22:04 SecurityAuth.audit Only the .log file has anything in it. Not

Re: HBase is not running.

2013-05-01 Thread Mohammad Tariq
hbase.rootdir the directory HBase writes data to. I you are planning to have a distributed HBase setup then set this property to some a directory in your HDFS, like hdfs://NN_MACHINE:9000/hbase. Otherwise point some dir on your local FS. And for hbase.zookeper.property.dataDir, create a separate

Re: Very poor read performance with composite keys in hbase

2013-05-01 Thread Anoop John
Navis Thanks for the issue link. Currently the read queries will start MR jobs as usual for reading from HBase. Correct? Is there any plan for supporting noMR? -Anoop- On Thu, May 2, 2013 at 7:09 AM, Navis류승우 navis@nexr.com wrote: Currently, hive storage handler reads rows one by

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread lars hofhansl
Hmm... Did you actually use exactly version 0.94.4, or the latest 0.94.7. I would be very curious to see profiling data. -- Lars - Original Message - From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Wednesday, May 1, 2013 6:01 PM Subject:

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
I used exactly 0.94.4, pulled from the tag in subversion. On May 1, 2013, at 9:41 PM, lars hofhansl la...@apache.org wrote: Hmm... Did you actually use exactly version 0.94.4, or the latest 0.94.7. I would be very curious to see profiling data. -- Lars - Original Message -