Hi,
I am trying to export from hbase to a CSV file.
I am using Scan class to scan all data in the table.
But i am facing some problems while doing it.
1) My table has around 1.5 million rows and around 150 columns for each
row , so i can not use default scan() constructor as it will scan whole
Sorry, maybe Phonex is not suitable for you.
On Thu, Jun 27, 2013 at 3:21 PM, Azuryy Yu azury...@gmail.com wrote:
1) Scan.setCaching() to specify the number of rows for caching that will
be passed to scanners.
and what's your block cache size?
but if OOM from the client, not sever
Hi All,
I wanted some help on understanding what's going on with my current setup.
I updated from config to the following settings:
property
namehbase.hregion.max.filesize/name
value107374182400/value
/property
property
namehbase.hregion.memstore.block.multiplier/name
the flush size is at 128m and there is no memory pressure
You mean there is enough memstore reserved heap in the RS, so that there
wont be premature flushes because of global heap pressure? What is the RS
max mem and how many regions and CFs in each? Can you check whether the
flushes happening
Thanks for the quick response Anoop.
The current memstore reserved (IIRC) would be 0.35 of total heap right ?
The RS total heap is 10231MB, used is at 5000MB. Total number of regions is
217 and there are approx 150 regions with 2 families, ~60 with 1 family and
remaining with 3 families.
How to
If reached memstore global up-limit, you'll find Blocking updates on in
your files(see MemStoreFlusher.reclaimMemStoreMemory);
If it's caused by too many log files, you'll find Too many hlogs: logs=(see
HLog.cleanOldLogs)
Hope it's helpful for you:)
Best,
Liang
Thanks Liang!
Found the logs. I had gone overboard with my grep's and missed the Too
many hlogs line for the regions that I was trying to debug.
A few sample log lines:
2013-06-27 07:42:49,602 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Too many hlogs: logs=33, maxlogs=32; forcing flush
The config hbase.regionserver.maxlogs specifies what is the max #logs and
defaults to 32. But remember if there are so many log files to replay then
the MTTR will become more (RS down case )
-Anoop-
On Thu, Jun 27, 2013 at 1:59 PM, Viral Bajaria viral.baja...@gmail.comwrote:
Thanks Liang!
hey Viral,
Which hbase version are you using?
On Thu, Jun 27, 2013 at 5:03 PM, Anoop John anoop.hb...@gmail.com wrote:
The config hbase.regionserver.maxlogs specifies what is the max #logs and
defaults to 32. But remember if there are so many log files to replay then
the MTTR will become
0.94.4 with plans to upgrade to the latest 0.94 release.
On Thu, Jun 27, 2013 at 2:22 AM, Azuryy Yu azury...@gmail.com wrote:
hey Viral,
Which hbase version are you using?
Can you paste your JVM options here? and Do you have an extensive write on
your hbase cluster?
On Thu, Jun 27, 2013 at 5:47 PM, Viral Bajaria viral.baja...@gmail.comwrote:
0.94.4 with plans to upgrade to the latest 0.94 release.
On Thu, Jun 27, 2013 at 2:22 AM, Azuryy Yu azury...@gmail.com
I do have a heavy write operation going on. Actually heavy is relative. Not
all tables/regions are seeing the same amount of writes at the same time.
There is definitely a burst of writes that can happen on some regions. In
addition to that there are some processing jobs which play catch up and
btw, don't use CMSIncrementalMode, iirc, it had been removed from hotspot
upstream accually.
发件人: Viral Bajaria [viral.baja...@gmail.com]
发送时间: 2013年6月27日 18:08
收件人: user@hbase.apache.org
主题: Re: 答复: flushing + compactions after config change
I do have a
I think you will need to update your hash function and redistribute data.
As far as I know this has been on of the drawbacks of this approach (and
the SemaText library)
Regards,
Shahab
On Wed, Jun 26, 2013 at 7:24 PM, Joarder KAMAL joard...@gmail.com wrote:
May be a simple question to answer
Thanks Shahab for the reply. I was also thinking in the same way.
Could you able to guide me through any reference which can confirm this
understanding?
Regards,
Joarder Kamal
On 27 June 2013 23:24, Shahab Yunus shahab.yu...@gmail.com wrote:
I think you will need to update your hash
Hello,
I am a bit confused how configurations of hbase replication and dfs
replication works together.
My application deploys on an HBase cluster (0.94.3) with two Region
servers. The two hadoop datanodes run on the same two Region severs.
Because we only have two datanodes, dfs.replication was
Jason,
HBase replication is for between two HBase clusters as you state.
What you are seeing is merely the expected behavior within a single
cluster. DFS replication is not involved directly here - the shell ends up
acting like any other HBase client and constructing the scan the same way
(i.e.
makes a lot of sense.
thanks Dave,
Jason
On Thu, Jun 27, 2013 at 10:26 AM, Dave Wang d...@cloudera.com wrote:
Jason,
HBase replication is for between two HBase clusters as you state.
What you are seeing is merely the expected behavior within a single
cluster. DFS replication is not
your JVM options arenot enough. I will give you some detail when I go back
office tomorrow.
--Send from my Sony mobile.
On Jun 27, 2013 6:09 PM, Viral Bajaria viral.baja...@gmail.com wrote:
I do have a heavy write operation going on. Actually heavy is relative. Not
all tables/regions are
I don't have a particular document or source stating this but I think it is
actually kind of self-explanatory if your think about the algorithm.
Anyway, you can read this
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
And
Not an easy task.
You first need to determine how you want to store the data within a column
and/or apply a type constraint to a column.
Even if you use JSON records to store your data within a column, does an
equality comparator exist? If not, you would have to write one.
(I kinda think
Howdy,
I want to take a look at a MR job which seems to be slower than I had
hoped. Mind you, this MR job is only running on a pseudo-distributed VM
(cloudera cdh4).
I have modified my mapred-site.xml with the following (that last one is
commented out because it crashes my MR job):
Thanks Azuryy. Look forward to it.
Does DEFERRED_LOG_FLUSH impact the number of WAL files that will be created
? Tried looking around but could not find the details.
On Thu, Jun 27, 2013 at 7:53 AM, Azuryy Yu azury...@gmail.com wrote:
your JVM options arenot enough. I will give you some detail
No, all your data eventually makes it into the log, just potentially
not as quickly :)
J-D
On Thu, Jun 27, 2013 at 2:06 PM, Viral Bajaria viral.baja...@gmail.com wrote:
Thanks Azuryy. Look forward to it.
Does DEFERRED_LOG_FLUSH impact the number of WAL files that will be created
? Tried
I realize standard comparators cannot solve this.
However I do know the type of each column so writing custom list
comparators for boolean, char, byte, short, int, long, float, double seems
quite straightforward.
Long arrays, for example, are stored as a byte array with 8 bytes per item
so a
You have to remember that HBase doesn't enforce any sort of typing.
That's why this can be difficult.
You'd have to write a coprocessor to enforce a schema on a table.
Even then YMMV if you're writing JSON structures to a column because while the
contents of the structures could be the same,
I see your point. Everything is just bytes.
However, the schema is known and every row is formatted according to this
schema, although some columns may not exist, that is, no value exist for
this property on this row.
So if im able to apply these typed comparators to the right cell values
it may
Phoenix, Hive, Pig, Java would all work.
But to Azury Yu's post...
The OP is doing a simple scan() to get rows.
If the OP is hitting an OOM exception then its a code issue on the part of the
OP.
On Jun 27, 2013, at 2:22 AM, Azuryy Yu azury...@gmail.com wrote:
Sorry, maybe Phonex is not
Ok...
If you want to do type checking and schema enforcement...
You will need to do this as a coprocessor.
The quick and dirty way... (Not recommended) would be to hard code the schema
in to the co-processor code.)
A better way... at start up, load up ZK to manage the set of known table
Hey JD,
Thanks for the clarification. I also came across a previous thread which
sort of talks about a similar problem.
http://mail-archives.apache.org/mod_mbox/hbase-user/201204.mbox/%3ccagptdnfwnrsnqv7n3wgje-ichzpx-cxn1tbchgwrpohgcos...@mail.gmail.com%3E
I guess my problem is also similar to
Thanks for your help Mike. Much appreciated.
I dont store rows/columns in JSON format. The schema is exactly that of a
specific java class, where the rowkey is a unique object identifier with
the class type encoded into it. Columns are the field names of the class
and the values are that of the
Hi Viral,
the following are all needed for CMS:
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=0
-XX:+CMSClassUnloadingEnabled
-XX:CMSMaxAbortablePrecleanTime=300
Hi Kristoffer,
Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You
could model your schema much like an O/R mapper and issue SQL queries through
Phoenix for your filtering.
James
@JamesPlusPlus
http://phoenix-hbase.blogspot.com
On Jun 27, 2013, at 4:39 PM, Kristoffer
ATT
Your row can be very wide.
Take a look at the first paragraph in this comment:
https://issues.apache.org/jira/browse/HBASE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633620#comment-13633620
Cheers
On Fri, Jun 28, 2013 at 10:40 AM, ch huang
I would suggest you could write a custom load balancer and then have your
hashing algo to determine how the load balancing should happen. Hope this
helps.
Regards
Ram
On Fri, Jun 28, 2013 at 5:32 AM, Joarder KAMAL joard...@gmail.com wrote:
Thanks St.Ack for mentioning about the
hi all:
can hbase start more than one instance ,like mysql, if can ,how to manage
these instances? ,thanks a lot
ATT
Hi,
How many column families should be there in an hbase table ? Is there any
performance issue in read/write if we have more column families ?
I have designed one table with around 14 column families in it with each
having on average 6 qualifiers.
Is it a good design ?
--
Thanks and Regards,
Hbase regions are stored in HFiles and HFiles use data node to store data of
hfiles.
Thanks,Sandeep.
Date: Fri, 28 Jun 2013 13:08:58 +0800
Subject: what's the relationship between hadoop datanode and hbase region
node?
From: justlo...@gmail.com
To: user@hbase.apache.org
ATT
40 matches
Mail list logo