Problems while exporting from Hbase to CSV file

2013-06-27 Thread Vimal Jain
Hi, I am trying to export from hbase to a CSV file. I am using Scan class to scan all data in the table. But i am facing some problems while doing it. 1) My table has around 1.5 million rows and around 150 columns for each row , so i can not use default scan() constructor as it will scan whole

Re: Problems while exporting from Hbase to CSV file

2013-06-27 Thread Azuryy Yu
Sorry, maybe Phonex is not suitable for you. On Thu, Jun 27, 2013 at 3:21 PM, Azuryy Yu azury...@gmail.com wrote: 1) Scan.setCaching() to specify the number of rows for caching that will be passed to scanners. and what's your block cache size? but if OOM from the client, not sever

flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
Hi All, I wanted some help on understanding what's going on with my current setup. I updated from config to the following settings: property namehbase.hregion.max.filesize/name value107374182400/value /property property namehbase.hregion.memstore.block.multiplier/name

Re: flushing + compactions after config change

2013-06-27 Thread Anoop John
the flush size is at 128m and there is no memory pressure You mean there is enough memstore reserved heap in the RS, so that there wont be premature flushes because of global heap pressure? What is the RS max mem and how many regions and CFs in each? Can you check whether the flushes happening

Re: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
Thanks for the quick response Anoop. The current memstore reserved (IIRC) would be 0.35 of total heap right ? The RS total heap is 10231MB, used is at 5000MB. Total number of regions is 217 and there are approx 150 regions with 2 families, ~60 with 1 family and remaining with 3 families. How to

答复: flushing + compactions after config change

2013-06-27 Thread 谢良
If reached memstore global up-limit, you'll find Blocking updates on in your files(see MemStoreFlusher.reclaimMemStoreMemory); If it's caused by too many log files, you'll find Too many hlogs: logs=(see HLog.cleanOldLogs) Hope it's helpful for you:) Best, Liang

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
Thanks Liang! Found the logs. I had gone overboard with my grep's and missed the Too many hlogs line for the regions that I was trying to debug. A few sample log lines: 2013-06-27 07:42:49,602 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many hlogs: logs=33, maxlogs=32; forcing flush

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Anoop John
The config hbase.regionserver.maxlogs specifies what is the max #logs and defaults to 32. But remember if there are so many log files to replay then the MTTR will become more (RS down case ) -Anoop- On Thu, Jun 27, 2013 at 1:59 PM, Viral Bajaria viral.baja...@gmail.comwrote: Thanks Liang!

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Azuryy Yu
hey Viral, Which hbase version are you using? On Thu, Jun 27, 2013 at 5:03 PM, Anoop John anoop.hb...@gmail.com wrote: The config hbase.regionserver.maxlogs specifies what is the max #logs and defaults to 32. But remember if there are so many log files to replay then the MTTR will become

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
0.94.4 with plans to upgrade to the latest 0.94 release. On Thu, Jun 27, 2013 at 2:22 AM, Azuryy Yu azury...@gmail.com wrote: hey Viral, Which hbase version are you using?

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Azuryy Yu
Can you paste your JVM options here? and Do you have an extensive write on your hbase cluster? On Thu, Jun 27, 2013 at 5:47 PM, Viral Bajaria viral.baja...@gmail.comwrote: 0.94.4 with plans to upgrade to the latest 0.94 release. On Thu, Jun 27, 2013 at 2:22 AM, Azuryy Yu azury...@gmail.com

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
I do have a heavy write operation going on. Actually heavy is relative. Not all tables/regions are seeing the same amount of writes at the same time. There is definitely a burst of writes that can happen on some regions. In addition to that there are some processing jobs which play catch up and

答复: 答复: flushing + compactions after config change

2013-06-27 Thread 谢良
btw, don't use CMSIncrementalMode, iirc, it had been removed from hotspot upstream accually. 发件人: Viral Bajaria [viral.baja...@gmail.com] 发送时间: 2013年6月27日 18:08 收件人: user@hbase.apache.org 主题: Re: 答复: flushing + compactions after config change I do have a

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

2013-06-27 Thread Shahab Yunus
I think you will need to update your hash function and redistribute data. As far as I know this has been on of the drawbacks of this approach (and the SemaText library) Regards, Shahab On Wed, Jun 26, 2013 at 7:24 PM, Joarder KAMAL joard...@gmail.com wrote: May be a simple question to answer

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

2013-06-27 Thread Joarder KAMAL
Thanks Shahab for the reply. I was also thinking in the same way. Could you able to guide me through any reference which can confirm this understanding? ​ Regards, Joarder Kamal On 27 June 2013 23:24, Shahab Yunus shahab.yu...@gmail.com wrote: I think you will need to update your hash

hbase replication and dfs replication?

2013-06-27 Thread Jason Huang
Hello, I am a bit confused how configurations of hbase replication and dfs replication works together. My application deploys on an HBase cluster (0.94.3) with two Region servers. The two hadoop datanodes run on the same two Region severs. Because we only have two datanodes, dfs.replication was

Re: hbase replication and dfs replication?

2013-06-27 Thread Dave Wang
Jason, HBase replication is for between two HBase clusters as you state. What you are seeing is merely the expected behavior within a single cluster. DFS replication is not involved directly here - the shell ends up acting like any other HBase client and constructing the scan the same way (i.e.

Re: hbase replication and dfs replication?

2013-06-27 Thread Jason Huang
makes a lot of sense. thanks Dave, Jason On Thu, Jun 27, 2013 at 10:26 AM, Dave Wang d...@cloudera.com wrote: Jason, HBase replication is for between two HBase clusters as you state. What you are seeing is merely the expected behavior within a single cluster. DFS replication is not

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Azuryy Yu
your JVM options arenot enough. I will give you some detail when I go back office tomorrow. --Send from my Sony mobile. On Jun 27, 2013 6:09 PM, Viral Bajaria viral.baja...@gmail.com wrote: I do have a heavy write operation going on. Actually heavy is relative. Not all tables/regions are

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

2013-06-27 Thread Shahab Yunus
I don't have a particular document or source stating this but I think it is actually kind of self-explanatory if your think about the algorithm. Anyway, you can read this http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ And

Re: Schema design for filters

2013-06-27 Thread Michael Segel
Not an easy task. You first need to determine how you want to store the data within a column and/or apply a type constraint to a column. Even if you use JSON records to store your data within a column, does an equality comparator exist? If not, you would have to write one. (I kinda think

Profiling map reduce jobs?

2013-06-27 Thread David Poisson
Howdy, I want to take a look at a MR job which seems to be slower than I had hoped. Mind you, this MR job is only running on a pseudo-distributed VM (cloudera cdh4). I have modified my mapred-site.xml with the following (that last one is commented out because it crashes my MR job):

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
Thanks Azuryy. Look forward to it. Does DEFERRED_LOG_FLUSH impact the number of WAL files that will be created ? Tried looking around but could not find the details. On Thu, Jun 27, 2013 at 7:53 AM, Azuryy Yu azury...@gmail.com wrote: your JVM options arenot enough. I will give you some detail

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Jean-Daniel Cryans
No, all your data eventually makes it into the log, just potentially not as quickly :) J-D On Thu, Jun 27, 2013 at 2:06 PM, Viral Bajaria viral.baja...@gmail.com wrote: Thanks Azuryy. Look forward to it. Does DEFERRED_LOG_FLUSH impact the number of WAL files that will be created ? Tried

Re: Schema design for filters

2013-06-27 Thread Kristoffer Sjögren
I realize standard comparators cannot solve this. However I do know the type of each column so writing custom list comparators for boolean, char, byte, short, int, long, float, double seems quite straightforward. Long arrays, for example, are stored as a byte array with 8 bytes per item so a

Re: Schema design for filters

2013-06-27 Thread Michael Segel
You have to remember that HBase doesn't enforce any sort of typing. That's why this can be difficult. You'd have to write a coprocessor to enforce a schema on a table. Even then YMMV if you're writing JSON structures to a column because while the contents of the structures could be the same,

Re: Schema design for filters

2013-06-27 Thread Kristoffer Sjögren
I see your point. Everything is just bytes. However, the schema is known and every row is formatted according to this schema, although some columns may not exist, that is, no value exist for this property on this row. So if im able to apply these typed comparators to the right cell values it may

Re: Problems while exporting from Hbase to CSV file

2013-06-27 Thread Michael Segel
Phoenix, Hive, Pig, Java would all work. But to Azury Yu's post... The OP is doing a simple scan() to get rows. If the OP is hitting an OOM exception then its a code issue on the part of the OP. On Jun 27, 2013, at 2:22 AM, Azuryy Yu azury...@gmail.com wrote: Sorry, maybe Phonex is not

Re: Schema design for filters

2013-06-27 Thread Michael Segel
Ok... If you want to do type checking and schema enforcement... You will need to do this as a coprocessor. The quick and dirty way... (Not recommended) would be to hard code the schema in to the co-processor code.) A better way... at start up, load up ZK to manage the set of known table

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Viral Bajaria
Hey JD, Thanks for the clarification. I also came across a previous thread which sort of talks about a similar problem. http://mail-archives.apache.org/mod_mbox/hbase-user/201204.mbox/%3ccagptdnfwnrsnqv7n3wgje-ichzpx-cxn1tbchgwrpohgcos...@mail.gmail.com%3E I guess my problem is also similar to

Re: Schema design for filters

2013-06-27 Thread Kristoffer Sjögren
Thanks for your help Mike. Much appreciated. I dont store rows/columns in JSON format. The schema is exactly that of a specific java class, where the rowkey is a unique object identifier with the class type encoded into it. Columns are the field names of the class and the values are that of the

Re: 答复: flushing + compactions after config change

2013-06-27 Thread Azuryy Yu
Hi Viral, the following are all needed for CMS: -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=300

Re: Schema design for filters

2013-06-27 Thread James Taylor
Hi Kristoffer, Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You could model your schema much like an O/R mapper and issue SQL queries through Phoenix for your filtering. James @JamesPlusPlus http://phoenix-hbase.blogspot.com On Jun 27, 2013, at 4:39 PM, Kristoffer

what the max number of column that a column family can has?

2013-06-27 Thread ch huang
ATT

Re: what the max number of column that a column family can has?

2013-06-27 Thread Ted Yu
Your row can be very wide. Take a look at the first paragraph in this comment: https://issues.apache.org/jira/browse/HBASE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633620#comment-13633620 Cheers On Fri, Jun 28, 2013 at 10:40 AM, ch huang

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

2013-06-27 Thread ramkrishna vasudevan
I would suggest you could write a custom load balancer and then have your hashing algo to determine how the load balancing should happen. Hope this helps. Regards Ram On Fri, Jun 28, 2013 at 5:32 AM, Joarder KAMAL joard...@gmail.com wrote: Thanks St.Ack for mentioning about the

is hbase cluster support multi-instance?

2013-06-27 Thread ch huang
hi all: can hbase start more than one instance ,like mysql, if can ,how to manage these instances? ,thanks a lot

what's the relationship between hadoop datanode and hbase region node?

2013-06-27 Thread ch huang
ATT

How many column families in one table ?

2013-06-27 Thread Vimal Jain
Hi, How many column families should be there in an hbase table ? Is there any performance issue in read/write if we have more column families ? I have designed one table with around 14 column families in it with each having on average 6 qualifiers. Is it a good design ? -- Thanks and Regards,

RE: what's the relationship between hadoop datanode and hbase region node?

2013-06-27 Thread Sandeep L
Hbase regions are stored in HFiles and HFiles use data node to store data of hfiles. Thanks,Sandeep. Date: Fri, 28 Jun 2013 13:08:58 +0800 Subject: what's the relationship between hadoop datanode and hbase region node? From: justlo...@gmail.com To: user@hbase.apache.org ATT