Re: timeouts with lots of coprocessor puts on single row

2013-08-27 Thread anil gupta
On Mon, Aug 26, 2013 at 10:56 PM, Olle Mårtensson olle.martens...@gmail.com
 wrote:

 Thank you for the link Anil it was a good explanation indeed.

 It's not recommended to do put/deletes across
 region servers like this.

 That was not my intention, I want to keep the region for the aggregates and
 the aggregated values on the same server. I read in the link that you gave
 me that I can achieve this by using coprocessor on the master, so I will
 try that out.

 Try to move this aggregation on the client side
 or at least outside RS.

 This is what I try to avoid since doing this would cause big data transfers
 between the client and the region server.
 The whole purpose of using the coprocessor is to push the aggregation work
 to the nodes where data is local and to minimize data transfer between the
 nodes.

 Why do you think it's a bad idea to do aggregate values inside of the
 regionserver, is it because it occupies RPC threads or because it's not a
 good usecase for coprocessors ?

I got the impression that your code is doing Inter-RS puts/gets from the
coprocessor.

 Do you think it's a bad idea even if I keep the regions for the two rows
 involved on the same regionserver and bypass RPC as the link suggests?

In my opinion, then it should be fine. I am not aware of how heavy/complex
your aggregations are. Obviously, more complex your CP(coprocessor) is,
more load you are putting on RS.


 Thanks // Olle


 On Mon, Aug 26, 2013 at 5:43 PM, anil gupta anilgupt...@gmail.com wrote:

  On Mon, Aug 26, 2013 at 7:27 AM, Olle Mårtensson
  olle.martens...@gmail.comwrote:
 
   Hi,
  
   I have developed a coprocessor that is extending BaseRegionObserver and
   implements the
   postPut method. The postPut method scans the columns of the row that
 the
   put was issued on and calculates an aggregated based on these values,
  when
   this is done a row in another table is updated with the aggregated
 value.
  
  This is an anti-pattern. It's not recommended to do put/deletes across
  region servers like this. Try to move this aggregation on the client side
  or at least outside RS. Here is the link for much detailed explanation
 why
  this is not good: http://search-hadoop.com/m/XtAi5Fogw32
 
   This works out fine until I put some stress on one row, then the
 threads
  on
   the regionserver hosting the table will freeze on flushing the put on
 the
   aggregated value.
   The client application basically do 100 concurrent puts on one row in a
   tight loop( on the table where the coprocessor is activated ).
   After that the client sleeps for a while and tries to fetch the
  aggregated
   value and here the client freezes and periodically burps out
 exceptions.
   It works if I don't run so many put's in parallel.
  
   The HBASE environment is pseudo distributed 0.94.11 with one
  regionserver.
  
   I have tried using a connection pool in the coprocessor, bumped up the
   heapsize of the regionServer and also to up the number of RPC threads
 for
   the regionserver but without luck.
  
   The pseudo code postPut would be something like this:
  
   vals = env.getRegion().get(get).getFamilyMap().values()
   agg_val = aggregate(vals)
   agg_table = env.getTable(aggregates)
   agg_table.setAutoFlush(false)
   put = new Put()
   put.add(agg_val)
   agg_table.put(put)
   agg_table.flushCommits()
   agg_table.close()
  
   And the real clojure variant is:
  
   https://gist.github.com/ollez/d0450930a591912aea5d#file-gistfile1-clj
  
   The hbase-site.xml:
  
   https://gist.github.com/ollez/d0450930a591912aea5d#file-hbase-site-xml
  
   The regionserver stacktrace:
  
  
  
 
 https://gist.github.com/ollez/d0450930a591912aea5d#file-regionserver-stacktrace
  
   The client exceptions:
  
  
 
 https://gist.github.com/ollez/d0450930a591912aea5d#file-client-exceptions
  
   Thanks // Olle
  
 
 
 
  --
  Thanks  Regards,
  Anil Gupta
 




-- 
Thanks  Regards,
Anil Gupta


HBase-Hive integration performance issues

2013-08-27 Thread Hao Ren

Hi,

I am running Hive and HBase on Amazon EC2. By following the tutorial: 
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration , I 
managed to create a HBase table from Hive and insert data into it.


It works but with a low performance. To be specific, inserting 1.3 Gb 
(50 M rows, 3 columns) takes 30 mins. It is far from what I excepted, 
say 100 s.


Actually, my EC2 cluster contains 3 slaves and 1 master whose instance 
type is medium(http://aws.amazon.com/ec2/instance-types/#instance-type).


Hadoop 1.0.4 is installed on my cluster. HBase is in pseudo-distributed 
mode. A region server is running on the master. HDFS is used as storage.


Here are some configuration files:

*// hive-site.xml*

configuration

property
namehbase.zookeeper.quorum/name
valueip-10-178-13-39.ec2.internal/value
/property

property
namehive.aux.jars.path/name
value/root/hive/build/dist/lib/hive-hbase-handler-0.9.0-amplab-4.jar,/root/hive/build/dist/lib/hbase-0.92.0.jar,/root/hive/build/dist/lib/zookeeper-3.4.3.jar,/root/hive/build/dist/lib/guava-r09.jar/value
/property

property
namehbase.client.scanner.caching/name
value1/value
/property

/configuration

*// hbase-site.xml*

configuration

property
namehbase.rootdir/name
valuehdfs://ec2-54-226-206-28.compute-1.amazonaws.com:9010/hbase/value
/property

property
namehbase.cluster.distributed/name
valuetrue/value
/property

property
namehbase.zookeeper.quorum/name
valueip-10-178-13-39.ec2.internal/value
/property

property
namehbase.client.scanner.caching/name
value1/value
/property

/configuration

*For understanding, I have some questions:*
1) In order to improve read performance, I have set 
hbase.client.scanner.caching to 1. But I don't know how to improve 
write performance. Is there some basic config to do ?
2) Does the distributed mode matter ? Does fully-distributed mode have 
better write performance than pseudo-distributed mode ?
3) If the number of region server is increased, will the write 
performance be improved ?
4) In pseudo-distributed mode (one hbase daemon on master), when writing 
data from hive to a hbase table, is the master the only entry to HBase ? 
I don't think all data passes through the master is efficient. I wonder 
whether it is possible write data in parallel from hive to hbase 
directly in using mapReduce ?

5) Will the HBase bulk loading help a lot ?

I am new to HBase, but I really want to integrate HBase in production.

Any help is highly appreciated ! =)

Hao

--
Hao Ren
ClaraVista
www.claravista.fr



Data Deduplication in HBase

2013-08-27 Thread Anand Nalya
Hi,

I have a use case in which I need to store segments of mp3 files in hbase.
A song may come to the application in different ovelapping segments. For
example, a 5 min song can have the following segments 0-1,0.5-2,2-4,3-5. As
seen, some of the data is duplicate (3-4 is present in the last 2
segments).

What would be the ideal way of removing this duplicate storage? Will snappy
compression help here or do I need to write some logic over HBase? Also,
what if I store a single segment multiple times. Will hbase do some sort of
deduplication?

Regards,
Anand


Re: HBase-Hive integration performance issues

2013-08-27 Thread Matt Davies
Hao,

A couple thoughts here.

This could be related to many things.
1. Did you pre-split your regions? If not, you could be hot-spotting on a
single server and then waiting for the region to split. If that is the
case, you could actually only be using a single server for much of your
load (if not all - depends on the region size you have configured) While
running did you see one system take the full load (via top, ganglia, or
some other tool)?

2.  The memory on each of these systems is quite low - 1.7 or 3.7 gb
depending if it is compute or memory - either way, it is way low, and I'd
expect you to be doing a lot of swapping.  You'll need 1 GB for each
daemon, which leaves you very little room for the OS (at 3.7 gb).  Do you
see swapping?  What are your JVM parameters?

3. Do these same 4 servers run your Hadoop infrastructure and the hive
query? If so, the system is woefully underpowered if you expect to see
production-like speed.  Running an Hive query on top of an HBase cluster
with so few resources will just not work out well in the end ;)


-Matt


On Tue, Aug 27, 2013 at 7:51 AM, Hao Ren h@claravista.fr wrote:

 Hi,

 I am running Hive and HBase on Amazon EC2. By following the tutorial:
 https://cwiki.apache.org/**confluence/display/Hive/**HBaseIntegrationhttps://cwiki.apache.org/confluence/display/Hive/HBaseIntegration,
  I managed to create a HBase table from Hive and insert data into it.

 It works but with a low performance. To be specific, inserting 1.3 Gb (50
 M rows, 3 columns) takes 30 mins. It is far from what I excepted, say 100 s.

 Actually, my EC2 cluster contains 3 slaves and 1 master whose instance
 type is 
 medium(http://aws.amazon.com/**ec2/instance-types/#instance-**typehttp://aws.amazon.com/ec2/instance-types/#instance-type
 ).

 Hadoop 1.0.4 is installed on my cluster. HBase is in pseudo-distributed
 mode. A region server is running on the master. HDFS is used as storage.

 Here are some configuration files:

 *// hive-site.xml*

 configuration

 property
 namehbase.zookeeper.quorum/**name
 valueip-10-178-13-39.ec2.**internal/value
 /property

 property
 namehive.aux.jars.path/**name
 value/root/hive/build/dist/**lib/hive-hbase-handler-0.9.0-**
 amplab-4.jar,/root/hive/build/**dist/lib/hbase-0.92.0.jar,/**
 root/hive/build/dist/lib/**zookeeper-3.4.3.jar,/root/**
 hive/build/dist/lib/guava-r09.**jar/value
 /property

 property
 namehbase.client.scanner.**caching/name
 value1/value
 /property

 /configuration

 *// hbase-site.xml*

 configuration

 property
 namehbase.rootdir/name
 valuehdfs://ec2-54-226-206-**28.compute-1.amazonaws.com:**9010/hbasehttp://ec2-54-226-206-28.compute-1.amazonaws.com:9010/hbase
 /value
 /property

 property
 namehbase.cluster.**distributed/name
 valuetrue/value
 /property

 property
 namehbase.zookeeper.quorum/**name
 valueip-10-178-13-39.ec2.**internal/value
 /property

 property
 namehbase.client.scanner.**caching/name
 value1/value
 /property

 /configuration

 *For understanding, I have some questions:*
 1) In order to improve read performance, I have set
 hbase.client.scanner.caching to 1. But I don't know how to improve
 write performance. Is there some basic config to do ?
 2) Does the distributed mode matter ? Does fully-distributed mode have
 better write performance than pseudo-distributed mode ?
 3) If the number of region server is increased, will the write performance
 be improved ?
 4) In pseudo-distributed mode (one hbase daemon on master), when writing
 data from hive to a hbase table, is the master the only entry to HBase ? I
 don't think all data passes through the master is efficient. I wonder
 whether it is possible write data in parallel from hive to hbase directly
 in using mapReduce ?
 5) Will the HBase bulk loading help a lot ?

 I am new to HBase, but I really want to integrate HBase in production.

 Any help is highly appreciated ! =)

 Hao

 --
 Hao Ren
 ClaraVista
 www.claravista.fr




Re: Data Deduplication in HBase

2013-08-27 Thread Ted Yu
bq.  Will hbase do some sort of deduplication?

I don't think so.

What is the granularity of segment overlap ? In the above example, it seems
to be 0.5

Cheers


On Tue, Aug 27, 2013 at 7:12 AM, Anand Nalya anand.na...@gmail.com wrote:

 Hi,

 I have a use case in which I need to store segments of mp3 files in hbase.
 A song may come to the application in different ovelapping segments. For
 example, a 5 min song can have the following segments 0-1,0.5-2,2-4,3-5. As
 seen, some of the data is duplicate (3-4 is present in the last 2
 segments).

 What would be the ideal way of removing this duplicate storage? Will snappy
 compression help here or do I need to write some logic over HBase? Also,
 what if I store a single segment multiple times. Will hbase do some sort of
 deduplication?

 Regards,
 Anand



Re: HBase-Hive integration performance issues

2013-08-27 Thread Hao Ren

Matt,

Thank you for the lightning reply.

I will try out what you have mentioned in these days, thus I could tell 
you some news on the issue in detail.


Thank you again. Your suggestions show me the way. =)

Hao

Le 27/08/2013 16:13, Matt Davies a écrit :

Hao,

A couple thoughts here.

This could be related to many things.
1. Did you pre-split your regions? If not, you could be hot-spotting on a
single server and then waiting for the region to split. If that is the
case, you could actually only be using a single server for much of your
load (if not all - depends on the region size you have configured) While
running did you see one system take the full load (via top, ganglia, or
some other tool)?

2.  The memory on each of these systems is quite low - 1.7 or 3.7 gb
depending if it is compute or memory - either way, it is way low, and I'd
expect you to be doing a lot of swapping.  You'll need 1 GB for each
daemon, which leaves you very little room for the OS (at 3.7 gb).  Do you
see swapping?  What are your JVM parameters?

3. Do these same 4 servers run your Hadoop infrastructure and the hive
query? If so, the system is woefully underpowered if you expect to see
production-like speed.  Running an Hive query on top of an HBase cluster
with so few resources will just not work out well in the end ;)


-Matt


On Tue, Aug 27, 2013 at 7:51 AM, Hao Ren h@claravista.fr wrote:


Hi,

I am running Hive and HBase on Amazon EC2. By following the tutorial:
https://cwiki.apache.org/**confluence/display/Hive/**HBaseIntegrationhttps://cwiki.apache.org/confluence/display/Hive/HBaseIntegration,
 I managed to create a HBase table from Hive and insert data into it.

It works but with a low performance. To be specific, inserting 1.3 Gb (50
M rows, 3 columns) takes 30 mins. It is far from what I excepted, say 100 s.

Actually, my EC2 cluster contains 3 slaves and 1 master whose instance
type is 
medium(http://aws.amazon.com/**ec2/instance-types/#instance-**typehttp://aws.amazon.com/ec2/instance-types/#instance-type
).

Hadoop 1.0.4 is installed on my cluster. HBase is in pseudo-distributed
mode. A region server is running on the master. HDFS is used as storage.

Here are some configuration files:

*// hive-site.xml*

configuration

 property
 namehbase.zookeeper.quorum/**name
 valueip-10-178-13-39.ec2.**internal/value
 /property

 property
 namehive.aux.jars.path/**name
value/root/hive/build/dist/**lib/hive-hbase-handler-0.9.0-**
amplab-4.jar,/root/hive/build/**dist/lib/hbase-0.92.0.jar,/**
root/hive/build/dist/lib/**zookeeper-3.4.3.jar,/root/**
hive/build/dist/lib/guava-r09.**jar/value
 /property

 property
 namehbase.client.scanner.**caching/name
 value1/value
 /property

/configuration

*// hbase-site.xml*

configuration

 property
 namehbase.rootdir/name
valuehdfs://ec2-54-226-206-**28.compute-1.amazonaws.com:**9010/hbasehttp://ec2-54-226-206-28.compute-1.amazonaws.com:9010/hbase
/value
 /property

 property
 namehbase.cluster.**distributed/name
 valuetrue/value
 /property

 property
 namehbase.zookeeper.quorum/**name
 valueip-10-178-13-39.ec2.**internal/value
 /property

 property
 namehbase.client.scanner.**caching/name
 value1/value
 /property

/configuration

*For understanding, I have some questions:*
1) In order to improve read performance, I have set
hbase.client.scanner.caching to 1. But I don't know how to improve
write performance. Is there some basic config to do ?
2) Does the distributed mode matter ? Does fully-distributed mode have
better write performance than pseudo-distributed mode ?
3) If the number of region server is increased, will the write performance
be improved ?
4) In pseudo-distributed mode (one hbase daemon on master), when writing
data from hive to a hbase table, is the master the only entry to HBase ? I
don't think all data passes through the master is efficient. I wonder
whether it is possible write data in parallel from hive to hbase directly
in using mapReduce ?
5) Will the HBase bulk loading help a lot ?

I am new to HBase, but I really want to integrate HBase in production.

Any help is highly appreciated ! =)

Hao

--
Hao Ren
ClaraVista
www.claravista.fr





--
Hao Ren
ClaraVista
www.claravista.fr


[Question: replication] why only one regionserver is used during replication? 0.94.9

2013-08-27 Thread Demai Ni
hi, guys,

I am using hbase 0.94.9. And setup replication from a 4-nodes master(3
regserver) to a 3-nodes slave(2 regserver).

I can tell that all source regservers  can successfully replicate data.
However, it seems for each particular table, only one regserver will handle
its replication at each given table.

For example, I am using YCSB to load 1,000,000 rows with workloada, with 16
threads. During the load period, I looked at the ageOfLastShippedOp and
sizeOfLogQueue. I can tell one of the regserver from Master is doing the
replication. While values of both age and sizeOfLog are growing, another
two regserver doesn't come into help.

So does that mean: for each table and process, only one regionserver will
do the replication regardless how long the queue is? Or did I miss some
setup configuration?

Thanks.

Demai


Re: [Question: replication] why only one regionserver is used during replication? 0.94.9

2013-08-27 Thread Jean-Daniel Cryans
Region servers replicate data written to them, so look at how your regions
are distributed.

J-D


On Tue, Aug 27, 2013 at 11:29 AM, Demai Ni nid...@gmail.com wrote:

 hi, guys,

 I am using hbase 0.94.9. And setup replication from a 4-nodes master(3
 regserver) to a 3-nodes slave(2 regserver).

 I can tell that all source regservers  can successfully replicate data.
 However, it seems for each particular table, only one regserver will handle
 its replication at each given table.

 For example, I am using YCSB to load 1,000,000 rows with workloada, with 16
 threads. During the load period, I looked at the ageOfLastShippedOp and
 sizeOfLogQueue. I can tell one of the regserver from Master is doing the
 replication. While values of both age and sizeOfLog are growing, another
 two regserver doesn't come into help.

 So does that mean: for each table and process, only one regionserver will
 do the replication regardless how long the queue is? Or did I miss some
 setup configuration?

 Thanks.

 Demai



Fwd: Hbase 0.94.6 stargate can't use multi get

2013-08-27 Thread Dmitriy Troyan
Hey all,

I try to use multi get for receiving different versions of row but it give
me only one always. For example I have table log, and column family
data:get. I put a lot of versions of row/data:log. Now I try to get all
versions of this key.

As it said in manual http://wiki.apache.org/hadoop/Hbase/Stargate (Cell or
Row Query (Multiple Values)): using browser for request
http://myhost.com:8080/log/data:get/0,1377633354/?v=10

The response contain only one version of a key instid of give me all (max
10) available versions of a key.

I break my brain on this problem. Please reffer me to right way.

Thanks!


Re: [Question: replication] why only one regionserver is used during replication? 0.94.9

2013-08-27 Thread Demai Ni
J-D, thanks for the tip.


On Tue, Aug 27, 2013 at 11:40 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 Region servers replicate data written to them, so look at how your regions
 are distributed.

 J-D


 On Tue, Aug 27, 2013 at 11:29 AM, Demai Ni nid...@gmail.com wrote:

  hi, guys,
 
  I am using hbase 0.94.9. And setup replication from a 4-nodes master(3
  regserver) to a 3-nodes slave(2 regserver).
 
  I can tell that all source regservers  can successfully replicate data.
  However, it seems for each particular table, only one regserver will
 handle
  its replication at each given table.
 
  For example, I am using YCSB to load 1,000,000 rows with workloada, with
 16
  threads. During the load period, I looked at the ageOfLastShippedOp and
  sizeOfLogQueue. I can tell one of the regserver from Master is doing the
  replication. While values of both age and sizeOfLog are growing, another
  two regserver doesn't come into help.
 
  So does that mean: for each table and process, only one regionserver will
  do the replication regardless how long the queue is? Or did I miss some
  setup configuration?
 
  Thanks.
 
  Demai
 



Hbase 0.94.6 stargate can't use multi get

2013-08-27 Thread Dmitriy Troyan
Hey all,

I try to use multi get for receiving different versions of row but it give
me only one always. For example I have table log, and column family
data:get. I put a lot of versions of row/data:log. Now I try to get all
versions of this key.

As it said in manual http://wiki.apache.org/hadoop/Hbase/Stargate (Cell or
Row Query (Multiple Values)): using browser for request
http://myhost.com:8080/log/data:get/0,1377633354/?v=10

The response contain only one version of a key instid of give me all (max
10) available versions of a key.

I break my brain on this problem. Please reffer me to right way.


Hbase 0.94.6 stargate can't use multi get

2013-08-27 Thread Dmitriy Troyan
Hey all,

I try to use multi get for receiving different versions of row but it give
me only one always. For example I have table log, and column family
data:get. I put a lot of versions of row/data:log. Now I try to get all
versions of this key.

As it said in manual http://wiki.apache.org/hadoop/Hbase/Stargate (Cell or
Row Query (Multiple Values)): using browser for request
http://myhost.com:8080/log/data:get/0,1377633354/?v=10

The response contain only one version of a key instid of give me all (max
10) available versions of a key.

I break my brain on this problem. Please reffer me to right way.

Thanks!


Writing multiple tables from reducer

2013-08-27 Thread jamal sasha
Hi,
  I am new to hbase and am trying to achieve the following.

I am reading data from hdfs in mapper and parsing it..

So, in reducer I want my output to write to hbase instead of hdfs
But here is the thing.

public static class MyTableReducer extends TableReducerText, Text,
ImmutableBytesWritable  {

 public void reduce(Text key, IterableText values, Context context)
throws IOException, InterruptedException {
int type = getType(values.toString());
if (type == 1) // put data to table 1
if (type==2) // put data to table 2


   }
}

How do I do this?
Thanks


Re: Writing multiple tables from reducer

2013-08-27 Thread Harsh J
You can use HBase's MultiTableOutputFormat:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.html

An example can be found in this blog post:
http://www.wildnove.com/2011/07/19/tutorial-hadoop-and-hbase-multitableoutputformat/

On Wed, Aug 28, 2013 at 3:50 AM, jamal sasha jamalsha...@gmail.com wrote:
 Hi,
   I am new to hbase and am trying to achieve the following.

 I am reading data from hdfs in mapper and parsing it..

 So, in reducer I want my output to write to hbase instead of hdfs
 But here is the thing.

 public static class MyTableReducer extends TableReducerText, Text,
 ImmutableBytesWritable  {

 public void reduce(Text key, IterableText values, Context context) throws
 IOException, InterruptedException {
 int type = getType(values.toString());
 if (type == 1) // put data to table 1
 if (type==2) // put data to table 2


   }
 }

 How do I do this?
 Thanks



-- 
Harsh J


Region locality and core/thread availability

2013-08-27 Thread Kiru Pakkirisamy


I think my app wants to hit a particular region all the time. Since my table is 
only read-only (lookup). I have created more than one and randomly pick one to 
use. This way I can load the whole cluster. Is there a feature in Hbase which 
lets coprocessors run on another region server on another copy of the region if 
the original assigned region server is busy, for read only operations or if we 
mark the table read-only ? I guess not.
Anyways, any tools/debugging tips to confirm this hot region behavior ?

 
Regards,
- kiru


Kiru Pakkirisamy | webcloudtech.wordpress.com

Re: [Question: replication] why only one regionserver is used during replication? 0.94.9

2013-08-27 Thread Toby Lazar
BEGIN:VCALENDAR
VERSION:2.0
PRODID://RESEARCH IN MOTION//BIS 3.0
METHOD:REQUEST
BEGIN:VEVENT
X-RIM-REVISION:0
X-MICROSOFT-CDO-BUSYSTATUS:BUSY
SUMMARY:Re: [Question: replication] why only one regionserver is used durin
 g replication? 0.94.9
CLASS:PUBLIC
ATTENDEE;PARTSTAT=NEEDS-ACTION;RSVP=TRUE:MAILTO:user@hbase.apache.org
ATTENDEE;PARTSTAT=NEEDS-ACTION;RSVP=TRUE:MAILTO:nid...@gmail.com
UID:XRIMCAL-564352521-1675005804-15279064
SEQUENCE:2
DTSTART:20130827T231557Z
DTEND:20130828T001557Z
DESCRIPTION:J-D\, thanks for the tip.\n\n\nOn Tue\, Aug 27\, 2013 at 11:40 
 AM\, Jean-Daniel Cryans jdcry...@apache.orgwrote:\n\n Region servers re
 plicate data written to them\, so look at how your regions\n are distribu
 ted.\n\n J-D\n\n\n On Tue\, Aug 27\, 2013 at 11:29 AM\, Demai Ni nid
 m...@gmail.com wrote:\n\n  hi\, guys\,\n \n  I am using hbase 0.94.
 9. And setup replication from a 4-nodes master(3\n  regserver) to a 3-no
 des slave(2 regserver).\n \n  I can tell that all source regservers  c
 an successfully replicate data.\n  However\, it seems for each particula
 r table\, only one regserver will\n handle\n  its replication at each g
 iven table.\n \n  For example\, I am using YCSB to load 1\,000\,000 ro
 ws with workloada\, with\n 16\n  threads. During the load period\, I lo
 oked at the ageOfLastShippedOp and\n  sizeOfLogQueue. I can tell one of 
 the regserver from Master is doing the\n  replication. While values of b
 oth age and sizeOfLog are growing\, another\n  two regserver doesn't com
 e into help.\n \n  So does that mean: for each table and process\, onl
 y one regionserver will\n  do the replication regardless how long the qu
 eue is? Or did I miss some\n  setup configuration?\n \n  Thanks.\n 
 \n  Demai\n \n
DTSTAMP:20130827T231318Z
ORGANIZER:MAILTO:tla...@capitaltg.com
END:VEVENT
END:VCALENDAR



how to export data from hbase to mysql?

2013-08-27 Thread ch huang
hi,all:
any good idea? thanks


Re: how to export data from hbase to mysql?

2013-08-27 Thread Jean-Marc Spaggiari
Take a look at sqoop?
Le 2013-08-27 23:08, ch huang justlo...@gmail.com a écrit :

 hi,all:
 any good idea? thanks



Re: how to export data from hbase to mysql?

2013-08-27 Thread James Taylor
Or if you'd like to be able to use SQL directly on it, take a look at
Phoenix (https://github.com/forcedotcom/phoenix).

James

On Aug 27, 2013, at 8:14 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:

 Take a look at sqoop?
 Le 2013-08-27 23:08, ch huang justlo...@gmail.com a écrit :

 hi,all:
 any good idea? thanks



Re: Hbase 0.94.6 stargate can't use multi get

2013-08-27 Thread Ravi Kiran
Hi ,
Can you please query for the schema of the table and show us here.
Would like to know what is value for VERSIONS that you have set for the
column family .  I hope you have set it to 10.

Ex:http://myhost.com:8080/log/schemahttp://myhost.com:8080/log/data:get/0,1377633354/?v=10


Regards
Ravi Magham


On Wed, Aug 28, 2013 at 1:29 AM, Dmitriy Troyan troyan.dmit...@gmail.comwrote:

 Hey all,

 I try to use multi get for receiving different versions of row but it give
 me only one always. For example I have table log, and column family
 data:get. I put a lot of versions of row/data:log. Now I try to get all
 versions of this key.

 As it said in manual http://wiki.apache.org/hadoop/Hbase/Stargate (Cell or
 Row Query (Multiple Values)): using browser for request
 http://myhost.com:8080/log/data:get/0,1377633354/?v=10

 The response contain only one version of a key instid of give me all (max
 10) available versions of a key.

 I break my brain on this problem. Please reffer me to right way.

 Thanks!



Re: how to export data from hbase to mysql?

2013-08-27 Thread ch huang
sqoop can not support export hbase data into mysql

On Wed, Aug 28, 2013 at 11:13 AM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Take a look at sqoop?
 Le 2013-08-27 23:08, ch huang justlo...@gmail.com a écrit :

  hi,all:
  any good idea? thanks