HBase MR - key/value mismatch

2013-09-05 Thread Omkar Joshi
I'm trying to execute a MR code over stand-alone HBase(0.94.11). I had read the 
HBase api and modified my MR code to read data and getting exceptions in the 
Reduce phase.

The exception I get is :

13/09/05 16:16:17 INFO mapred.JobClient:  map 0% reduce 0%

13/09/05 16:23:31 INFO mapred.JobClient: Task Id : 
attempt_201309051437_0005_m_00_0, Status : FAILED

java.io.IOException: wrong key class: class 
org.apache.hadoop.hbase.io.ImmutableBytesWritable is not class 
org.apache.hadoop.io.Text

at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)

at 
org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1168)

at 
org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1492)

at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)

at 
com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:199)

at 
com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:1)

at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)

at 
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1513)

at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)

at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)

at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)

at org.apache.hadoop.mapred.Child.main(Child.java:249)



Providing the partial(excluding the business logic) codes


Mapper:


public class SentimentCalculationHBaseMapper extends TableMapperText, Text {



private Text sentenseOriginal = new Text();

private Text sentenseParsed = new Text();



@Override

protected void map(

ImmutableBytesWritable key,

Result value,

org.apache.hadoop.mapreduce.MapperImmutableBytesWritable, Result, 
Text, Text.Context context)

throws IOException, InterruptedException {

context.write(this.sentenseOriginal, this.sentenseParsed);

}

}

Reducer :


public class SentimentCalculationHBaseReducer extends

TableReducerText, Text, ImmutableBytesWritable {



@Override

protected void reduce(

Text key,

java.lang.IterableText values,

org.apache.hadoop.mapreduce.ReducerText, Text, 
ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context)

throws IOException, InterruptedException {



Double mdblSentimentOverall = 0.0;





String d3 = key + @12321@ + s11.replaceFirst(:::, )

+ @12321@ + mstpositiveWords + @12321@

+ mstnegativeWords + @12321@ + mstneutralWords;



System.out.println(d3 :  + d3 +  , mdblSentimentOverall : 

+ mdblSentimentOverall);



Put put = new Put(d3.getBytes());



put.add(Bytes.toBytes(word_attributes),

Bytes.toBytes(mdblSentimentOverall),

Bytes.toBytes(mdblSentimentOverall));



System.out.println(Context is  + context);



context.write(new ImmutableBytesWritable(d3.getBytes()), put);

}

}

SentimentCalculatorHBase - the Tool/main class :
package com.hbase.mapreduce;

import java.util.Calendar;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class SentimentCalculatorHBase extends Configured implements Tool {

/**
 * @param args
 * @throws Exception
 */
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
SentimentCalculatorHBase sentimentCalculatorHBase = new 
SentimentCalculatorHBase();
ToolRunner.run(sentimentCalculatorHBase, args);
}

@Override
public int run(String[] arg0) throws Exception {
// TODO Auto-generated method stub


System.out
.println(***Configuration 

Re: Suggestion need on desinging Flatten table for HBase given scenario

2013-09-05 Thread Ted Yu
The attachment in your original email didn't go through. 

Please put it on some website so that everyone can see it. 

Thanks

On Sep 4, 2013, at 10:24 PM, Ramasubramanian Narayanan 
ramasubramanian.naraya...@gmail.com wrote:

 Hi
 
 Have shared to you in Google +
 
 Can't you see that picture as an attachment in my earlier mail?
 
 Request you to confirm else I will resent the mail with attachment...
 
 regards,
 Rams
 
 
 On Thu, Sep 5, 2013 at 10:39 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 I don't see image.
 
 Can you upload to some website ?
 
 Thanks
 
 On Sep 4, 2013, at 10:05 PM, Ramasubramanian Narayanan 
 ramasubramanian.naraya...@gmail.com wrote:
 
 
 Dear All,
 
 
 For the below 1 to many relationship column sets, require suggestion on
 how to design a Flatten HBase table... Kindly refer the attached image for
 the scenario...
 
 Pls let me know if my scenario is not clearly explained...
 
 regards,
 Rams
 


Re: HBase MR - key/value mismatch

2013-09-05 Thread Shahab Yunus
Try using Bytes.toBytes(your string) rather than String.getBytes.

Regards,
Shahab


On Thu, Sep 5, 2013 at 2:16 AM, Omkar Joshi omkar.jo...@lntinfotech.comwrote:

 I'm trying to execute a MR code over stand-alone HBase(0.94.11). I had
 read the HBase api and modified my MR code to read data and getting
 exceptions in the Reduce phase.

 The exception I get is :

 13/09/05 16:16:17 INFO mapred.JobClient:  map 0% reduce 0%

 13/09/05 16:23:31 INFO mapred.JobClient: Task Id :
 attempt_201309051437_0005_m_00_0, Status : FAILED

 java.io.IOException: wrong key class: class
 org.apache.hadoop.hbase.io.ImmutableBytesWritable is not class
 org.apache.hadoop.io.Text

 at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)

 at
 org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1168)

 at
 org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1492)

 at
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)

 at
 com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:199)

 at
 com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:1)

 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)

 at
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1513)

 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)

 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)

 at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)

 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)

 at org.apache.hadoop.mapred.Child.main(Child.java:249)



 Providing the partial(excluding the business logic) codes


 Mapper:


 public class SentimentCalculationHBaseMapper extends TableMapperText,
 Text {



 private Text sentenseOriginal = new Text();

 private Text sentenseParsed = new Text();



 @Override

 protected void map(

 ImmutableBytesWritable key,

 Result value,

 org.apache.hadoop.mapreduce.MapperImmutableBytesWritable,
 Result, Text, Text.Context context)

 throws IOException, InterruptedException {

 context.write(this.sentenseOriginal, this.sentenseParsed);

 }

 }

 Reducer :


 public class SentimentCalculationHBaseReducer extends

 TableReducerText, Text, ImmutableBytesWritable {



 @Override

 protected void reduce(

 Text key,

 java.lang.IterableText values,

 org.apache.hadoop.mapreduce.ReducerText, Text,
 ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context)

 throws IOException, InterruptedException {



 Double mdblSentimentOverall = 0.0;





 String d3 = key + @12321@ + s11.replaceFirst(:::, )

 + @12321@ + mstpositiveWords + @12321@

 + mstnegativeWords + @12321@ + mstneutralWords;



 System.out.println(d3 :  + d3 +  , mdblSentimentOverall : 

 + mdblSentimentOverall);



 Put put = new Put(d3.getBytes());



 put.add(Bytes.toBytes(word_attributes),

 Bytes.toBytes(mdblSentimentOverall),

 Bytes.toBytes(mdblSentimentOverall));



 System.out.println(Context is  + context);



 context.write(new ImmutableBytesWritable(d3.getBytes()), put);

 }

 }

 SentimentCalculatorHBase - the Tool/main class :
 package com.hbase.mapreduce;

 import java.util.Calendar;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.hbase.client.Put;
 import org.apache.hadoop.hbase.client.Scan;
 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
 import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
 import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
 import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;

 public class SentimentCalculatorHBase extends Configured implements Tool {

 /**
  * @param args
  * @throws Exception
  */
 public static void main(String[] args) throws Exception {
 // TODO Auto-generated method stub
 SentimentCalculatorHBase sentimentCalculatorHBase = new
 SentimentCalculatorHBase();
 

Re: user action modeling

2013-09-05 Thread Shahab Yunus
Your read queries seem to be more driven form the 'action' and 'object'
perspective, rather than user.

1- So one option is that you make a composite key with action and object:

action|object and the columns are users who are generating events on this
combination. You can scan using prefix filter if you want to look at data
specific set of action and object i.e. your requirements 1, 3  4. Key
distribution should be OK too. The drawbacks here are that a) you can end
up with really wide rows b) what if you want to store more information than
just user id in the columns?

The friends part is not that trivial and you have to maintain that
relationship out of this main table or create complex composite entities (I
need to think about it more, HBase is not a graph database.)

Regards,
Shahab


On Thu, Sep 5, 2013 at 1:16 AM, Marcos Sousa
marcoscaixetaso...@gmail.comwrote:

 Hi,

 I'm working with HBase since the last 3 moths, now I have to store user
 actions, at first look, using Hbase.

 I have a limited number of actions, thousands of objects and about 50
 million users interacting with them, around 2 billion interactions per
 month.

 I have to answer there questions:
 How many users performed action 'foo' at object 'bar'
 What friends performed performed action 'foo' at object 'bar'
 What users made 'foo' at object 'bar' last week.
 What objects received more action 'foo'

 Does anybody have suggestions to a schema for this problem?

 Best regards,

 --
 Marcos Sousa



Programming practices for implementing composite row keys

2013-09-05 Thread praveenesh kumar
Hello people,

I have a scenario which requires creating composite row keys for my hbase
table.

Basically it would be entity1,entity2,entity3.

Search would be based by entity1 and then entity2 and 3.. I know I can do
row start-stopscan on entity1 first and then put row filters on entity2
and entity3.

My question is what are the best programming principles to implement these
keys.

1. Just use simple delimiters entity1:entity2:entity3.

2. Create complex datatypes like java structures. I don't know if anyone
uses structures as keys and if they do, can someone please highlight me for
which scenarios they would be good fit. Does they fit good for this
scenario.

3. What are the pros and cons for both 1 and 2, when it comes for data
retrieval.

4. My entity1 can be negative also. Does it make any special difference
when hbase ordering is concerned. How can I tackle this scenario.

Any help on how to implement composite row keys would be highly helpful. I
want to understand how the community deals with implementing composite row
keys.

Regards
Praveenesh


Re: HBase MR - key/value mismatch

2013-09-05 Thread Ted Yu
public class SentimentCalculationHBaseReducer extends

TableReducerText, Text, ImmutableBytesWritable {

The first type parameter for reducer should be ImmutableBytesWritable

Cheers


On Wed, Sep 4, 2013 at 11:16 PM, Omkar Joshi omkar.jo...@lntinfotech.comwrote:

 I'm trying to execute a MR code over stand-alone HBase(0.94.11). I had
 read the HBase api and modified my MR code to read data and getting
 exceptions in the Reduce phase.

 The exception I get is :

 13/09/05 16:16:17 INFO mapred.JobClient:  map 0% reduce 0%

 13/09/05 16:23:31 INFO mapred.JobClient: Task Id :
 attempt_201309051437_0005_m_00_0, Status : FAILED

 java.io.IOException: wrong key class: class
 org.apache.hadoop.hbase.io.ImmutableBytesWritable is not class
 org.apache.hadoop.io.Text

 at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)

 at
 org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1168)

 at
 org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1492)

 at
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)

 at
 com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:199)

 at
 com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:1)

 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)

 at
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1513)

 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)

 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)

 at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)

 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)

 at org.apache.hadoop.mapred.Child.main(Child.java:249)



 Providing the partial(excluding the business logic) codes


 Mapper:


 public class SentimentCalculationHBaseMapper extends TableMapperText,
 Text {



 private Text sentenseOriginal = new Text();

 private Text sentenseParsed = new Text();



 @Override

 protected void map(

 ImmutableBytesWritable key,

 Result value,

 org.apache.hadoop.mapreduce.MapperImmutableBytesWritable,
 Result, Text, Text.Context context)

 throws IOException, InterruptedException {

 context.write(this.sentenseOriginal, this.sentenseParsed);

 }

 }

 Reducer :


 public class SentimentCalculationHBaseReducer extends

 TableReducerText, Text, ImmutableBytesWritable {



 @Override

 protected void reduce(

 Text key,

 java.lang.IterableText values,

 org.apache.hadoop.mapreduce.ReducerText, Text,
 ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context)

 throws IOException, InterruptedException {



 Double mdblSentimentOverall = 0.0;





 String d3 = key + @12321@ + s11.replaceFirst(:::, )

 + @12321@ + mstpositiveWords + @12321@

 + mstnegativeWords + @12321@ + mstneutralWords;



 System.out.println(d3 :  + d3 +  , mdblSentimentOverall : 

 + mdblSentimentOverall);



 Put put = new Put(d3.getBytes());



 put.add(Bytes.toBytes(word_attributes),

 Bytes.toBytes(mdblSentimentOverall),

 Bytes.toBytes(mdblSentimentOverall));



 System.out.println(Context is  + context);



 context.write(new ImmutableBytesWritable(d3.getBytes()), put);

 }

 }

 SentimentCalculatorHBase - the Tool/main class :
 package com.hbase.mapreduce;

 import java.util.Calendar;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.hbase.client.Put;
 import org.apache.hadoop.hbase.client.Scan;
 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
 import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
 import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
 import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Job;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;

 public class SentimentCalculatorHBase extends Configured implements Tool {

 /**
  * @param args
  * @throws Exception
  */
 public static void main(String[] args) throws Exception {
 // TODO Auto-generated method stub

Re: user action modeling

2013-09-05 Thread Marcos Sousa
Hi,

Yes, that the point, I need to save dynamic parameters for each action :(

I was thinking about, distributing the data in 3 tables:
 - users: which I have all data about user and the list of friends and
documents that he performed the action
 - user_actions: to save action and futher parameters
 - objects: save the list of users who performed the action.

Replicating data like that, I will have 3 times more writing operations.

Curiously, the intersect part, aka friends who did the same action, is one
of the most used part.

Regards,

Marcos Sousa


On Thu, Sep 5, 2013 at 10:55 AM, Shahab Yunus shahab.yu...@gmail.comwrote:

 Your read queries seem to be more driven form the 'action' and 'object'
 perspective, rather than user.

 1- So one option is that you make a composite key with action and object:

 action|object and the columns are users who are generating events on this
 combination. You can scan using prefix filter if you want to look at data
 specific set of action and object i.e. your requirements 1, 3  4. Key
 distribution should be OK too. The drawbacks here are that a) you can end
 up with really wide rows b) what if you want to store more information than
 just user id in the columns?

 The friends part is not that trivial and you have to maintain that
 relationship out of this main table or create complex composite entities (I
 need to think about it more, HBase is not a graph database.)

 Regards,
 Shahab


 On Thu, Sep 5, 2013 at 1:16 AM, Marcos Sousa
 marcoscaixetaso...@gmail.comwrote:

  Hi,
 
  I'm working with HBase since the last 3 moths, now I have to store user
  actions, at first look, using Hbase.
 
  I have a limited number of actions, thousands of objects and about 50
  million users interacting with them, around 2 billion interactions per
  month.
 
  I have to answer there questions:
  How many users performed action 'foo' at object 'bar'
  What friends performed performed action 'foo' at object 'bar'
  What users made 'foo' at object 'bar' last week.
  What objects received more action 'foo'
 
  Does anybody have suggestions to a schema for this problem?
 
  Best regards,
 
  --
  Marcos Sousa
 



Re: Programming practices for implementing composite row keys

2013-09-05 Thread Ted Yu
For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which
has been integrated to 0.96

Cheers


On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote:

 My 2 cents:

 1- Yes, that is one way to do it. You can also use fixed length for every
 attribute participating in the composite key. HBase scan would be more
 fitting to this pattern as well, I believe (?) It's a trade-off basically
 between space (all that padding increasing the key size) versus
 complexities involved in deciding and handling a delimiter and consequent
 parsing of keys etc.

 2- I personally have not heard about this. As far as I understand, this
 goes against the whole idea of HBase scanning and prefix and fuzzy filters
 will not be possible this way. This should not be followed.

 3- See replies to 1  2

 4- The sorting of the keys, by default, is binary comparator. It is a bit
 tricky as far as I know and the last I checked. Some tips here:

 http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers

 Can you normalize them (or take an absolute) before reading and writing (of
 course at the cost of performance) if it is possible i.e. keys with same
 amount but different magnitude cannot exist as well as different entities.
 This depends on your business logic and type/nature of data.

 Regards,
 Shahab


 On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com
 wrote:

  Hello people,
 
  I have a scenario which requires creating composite row keys for my hbase
  table.
 
  Basically it would be entity1,entity2,entity3.
 
  Search would be based by entity1 and then entity2 and 3.. I know I can do
  row start-stopscan on entity1 first and then put row filters on entity2
  and entity3.
 
  My question is what are the best programming principles to implement
 these
  keys.
 
  1. Just use simple delimiters entity1:entity2:entity3.
 
  2. Create complex datatypes like java structures. I don't know if anyone
  uses structures as keys and if they do, can someone please highlight me
 for
  which scenarios they would be good fit. Does they fit good for this
  scenario.
 
  3. What are the pros and cons for both 1 and 2, when it comes for data
  retrieval.
 
  4. My entity1 can be negative also. Does it make any special difference
  when hbase ordering is concerned. How can I tackle this scenario.
 
  Any help on how to implement composite row keys would be highly helpful.
 I
  want to understand how the community deals with implementing composite
 row
  keys.
 
  Regards
  Praveenesh
 



Re: Programming practices for implementing composite row keys

2013-09-05 Thread Shahab Yunus
Ah! I didn't know about HBASE-8693. Good information. Thanks Ted.

Regards,
Shahab


On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote:

 For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which
 has been integrated to 0.96

 Cheers


 On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com
 wrote:

  My 2 cents:
 
  1- Yes, that is one way to do it. You can also use fixed length for every
  attribute participating in the composite key. HBase scan would be more
  fitting to this pattern as well, I believe (?) It's a trade-off basically
  between space (all that padding increasing the key size) versus
  complexities involved in deciding and handling a delimiter and consequent
  parsing of keys etc.
 
  2- I personally have not heard about this. As far as I understand, this
  goes against the whole idea of HBase scanning and prefix and fuzzy
 filters
  will not be possible this way. This should not be followed.
 
  3- See replies to 1  2
 
  4- The sorting of the keys, by default, is binary comparator. It is a bit
  tricky as far as I know and the last I checked. Some tips here:
 
 
 http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers
 
  Can you normalize them (or take an absolute) before reading and writing
 (of
  course at the cost of performance) if it is possible i.e. keys with same
  amount but different magnitude cannot exist as well as different
 entities.
  This depends on your business logic and type/nature of data.
 
  Regards,
  Shahab
 
 
  On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com
  wrote:
 
   Hello people,
  
   I have a scenario which requires creating composite row keys for my
 hbase
   table.
  
   Basically it would be entity1,entity2,entity3.
  
   Search would be based by entity1 and then entity2 and 3.. I know I can
 do
   row start-stopscan on entity1 first and then put row filters on
 entity2
   and entity3.
  
   My question is what are the best programming principles to implement
  these
   keys.
  
   1. Just use simple delimiters entity1:entity2:entity3.
  
   2. Create complex datatypes like java structures. I don't know if
 anyone
   uses structures as keys and if they do, can someone please highlight me
  for
   which scenarios they would be good fit. Does they fit good for this
   scenario.
  
   3. What are the pros and cons for both 1 and 2, when it comes for data
   retrieval.
  
   4. My entity1 can be negative also. Does it make any special
 difference
   when hbase ordering is concerned. How can I tackle this scenario.
  
   Any help on how to implement composite row keys would be highly
 helpful.
  I
   want to understand how the community deals with implementing composite
  row
   keys.
  
   Regards
   Praveenesh
  
 



Re: HBase MR - key/value mismatch

2013-09-05 Thread Ted Yu
The reducer also serves as combiner whose output would be sent to reducer.

org.apache.hadoop.mapreduce.ReducerText, Text,
ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context)

So the type parameters above should facilitate this.
Take a look at the PutCombiner from HBase source code:

public class PutCombinerK extends ReducerK, Put, K, Put {
Cheers

On Thu, Sep 5, 2013 at 9:46 AM, Shahab Yunus shahab.yu...@gmail.com wrote:

 Ted,

 Might be a something very basic that I am missing but why should OP's
 reducer's key be of type ImmutableBytesWritable if he is emitting Text in
 the mapper? Thanks.

  protected void map(

 ImmutableBytesWritable key,

 Result value,

 org.apache.hadoop.mapreduce.MapperImmutableBytesWritable,
 Result, Text, Text.Context context)

 throws IOException, InterruptedException {

 context.write(this.sentenseOriginal, this.sentenseParsed);
 //sentenseOriginal
 is Text


 Regards,
 Shahab


 On Thu, Sep 5, 2013 at 10:34 AM, Ted Yu yuzhih...@gmail.com wrote:

  public class SentimentCalculationHBaseReducer extends
 
  TableReducerText, Text, ImmutableBytesWritable {
 
  The first type parameter for reducer should be ImmutableBytesWritable
 
  Cheers
 
 
  On Wed, Sep 4, 2013 at 11:16 PM, Omkar Joshi 
 omkar.jo...@lntinfotech.com
  wrote:
 
   I'm trying to execute a MR code over stand-alone HBase(0.94.11). I had
   read the HBase api and modified my MR code to read data and getting
   exceptions in the Reduce phase.
  
   The exception I get is :
  
   13/09/05 16:16:17 INFO mapred.JobClient:  map 0% reduce 0%
  
   13/09/05 16:23:31 INFO mapred.JobClient: Task Id :
   attempt_201309051437_0005_m_00_0, Status : FAILED
  
   java.io.IOException: wrong key class: class
   org.apache.hadoop.hbase.io.ImmutableBytesWritable is not class
   org.apache.hadoop.io.Text
  
   at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)
  
   at
  
 
 org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1168)
  
   at
  
 
 org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1492)
  
   at
  
 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
  
   at
  
 
 com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:199)
  
   at
  
 
 com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:1)
  
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
  
   at
   org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1513)
  
   at
  
 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)
  
   at
  
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
  
   at
  
 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
  
   at
  org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
  
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
  
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
  
   at java.security.AccessController.doPrivileged(Native Method)
  
   at javax.security.auth.Subject.doAs(Subject.java:415)
  
   at
  
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
  
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
  
  
  
   Providing the partial(excluding the business logic) codes
  
  
   Mapper:
  
  
   public class SentimentCalculationHBaseMapper extends TableMapperText,
   Text {
  
  
  
   private Text sentenseOriginal = new Text();
  
   private Text sentenseParsed = new Text();
  
  
  
   @Override
  
   protected void map(
  
   ImmutableBytesWritable key,
  
   Result value,
  
   org.apache.hadoop.mapreduce.MapperImmutableBytesWritable,
   Result, Text, Text.Context context)
  
   throws IOException, InterruptedException {
  
   context.write(this.sentenseOriginal, this.sentenseParsed);
  
   }
  
   }
  
   Reducer :
  
  
   public class SentimentCalculationHBaseReducer extends
  
   TableReducerText, Text, ImmutableBytesWritable {
  
  
  
   @Override
  
   protected void reduce(
  
   Text key,
  
   java.lang.IterableText values,
  
   org.apache.hadoop.mapreduce.ReducerText, Text,
   ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context)
  
   throws IOException, InterruptedException {
  
  
  
   Double mdblSentimentOverall = 0.0;
  
  
  
  
  
   String d3 = key + @12321@ + s11.replaceFirst(:::, )
  
   + @12321@ + mstpositiveWords + @12321@
  
   + mstnegativeWords + @12321@ + mstneutralWords;
  
  
  
   

Concurrent connections to Hbase

2013-09-05 Thread Kiru Pakkirisamy
Hi All,
I'd like to hear from users who are running a  big HBase setup with multiple 
concurrent connections.
Woud like to know the -# of cores/machines, # of queries. Get/RPCs , Hbase 
version etc.
We are trying to build an application with sub-second query performance (using 
coprocessors)  and want to scale it out to 10s of thousands of concurrent 
queries. We are now at 500-600 do see a bug like HBASE-9410 
Any positive/negative experiences in a similar situation ?

 
Regards,
- kiru


Kiru Pakkirisamy | webcloudtech.wordpress.com

Re: HBase MR - key/value mismatch

2013-09-05 Thread Shahab Yunus
Ted,

Might be a something very basic that I am missing but why should OP's
reducer's key be of type ImmutableBytesWritable if he is emitting Text in
the mapper? Thanks.

 protected void map(

ImmutableBytesWritable key,

Result value,

org.apache.hadoop.mapreduce.MapperImmutableBytesWritable,
Result, Text, Text.Context context)

throws IOException, InterruptedException {

context.write(this.sentenseOriginal, this.sentenseParsed); //sentenseOriginal
is Text


Regards,
Shahab


On Thu, Sep 5, 2013 at 10:34 AM, Ted Yu yuzhih...@gmail.com wrote:

 public class SentimentCalculationHBaseReducer extends

 TableReducerText, Text, ImmutableBytesWritable {

 The first type parameter for reducer should be ImmutableBytesWritable

 Cheers


 On Wed, Sep 4, 2013 at 11:16 PM, Omkar Joshi omkar.jo...@lntinfotech.com
 wrote:

  I'm trying to execute a MR code over stand-alone HBase(0.94.11). I had
  read the HBase api and modified my MR code to read data and getting
  exceptions in the Reduce phase.
 
  The exception I get is :
 
  13/09/05 16:16:17 INFO mapred.JobClient:  map 0% reduce 0%
 
  13/09/05 16:23:31 INFO mapred.JobClient: Task Id :
  attempt_201309051437_0005_m_00_0, Status : FAILED
 
  java.io.IOException: wrong key class: class
  org.apache.hadoop.hbase.io.ImmutableBytesWritable is not class
  org.apache.hadoop.io.Text
 
  at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164)
 
  at
 
 org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1168)
 
  at
 
 org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1492)
 
  at
 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 
  at
 
 com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:199)
 
  at
 
 com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:1)
 
  at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
 
  at
  org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1513)
 
  at
 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)
 
  at
  org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
 
  at
 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
 
  at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
 
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 
  at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 
  at java.security.AccessController.doPrivileged(Native Method)
 
  at javax.security.auth.Subject.doAs(Subject.java:415)
 
  at
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
 
  at org.apache.hadoop.mapred.Child.main(Child.java:249)
 
 
 
  Providing the partial(excluding the business logic) codes
 
 
  Mapper:
 
 
  public class SentimentCalculationHBaseMapper extends TableMapperText,
  Text {
 
 
 
  private Text sentenseOriginal = new Text();
 
  private Text sentenseParsed = new Text();
 
 
 
  @Override
 
  protected void map(
 
  ImmutableBytesWritable key,
 
  Result value,
 
  org.apache.hadoop.mapreduce.MapperImmutableBytesWritable,
  Result, Text, Text.Context context)
 
  throws IOException, InterruptedException {
 
  context.write(this.sentenseOriginal, this.sentenseParsed);
 
  }
 
  }
 
  Reducer :
 
 
  public class SentimentCalculationHBaseReducer extends
 
  TableReducerText, Text, ImmutableBytesWritable {
 
 
 
  @Override
 
  protected void reduce(
 
  Text key,
 
  java.lang.IterableText values,
 
  org.apache.hadoop.mapreduce.ReducerText, Text,
  ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context)
 
  throws IOException, InterruptedException {
 
 
 
  Double mdblSentimentOverall = 0.0;
 
 
 
 
 
  String d3 = key + @12321@ + s11.replaceFirst(:::, )
 
  + @12321@ + mstpositiveWords + @12321@
 
  + mstnegativeWords + @12321@ + mstneutralWords;
 
 
 
  System.out.println(d3 :  + d3 +  , mdblSentimentOverall :
 
 
  + mdblSentimentOverall);
 
 
 
  Put put = new Put(d3.getBytes());
 
 
 
  put.add(Bytes.toBytes(word_attributes),
 
  Bytes.toBytes(mdblSentimentOverall),
 
  Bytes.toBytes(mdblSentimentOverall));
 
 
 
  System.out.println(Context is  + context);
 
 
 
  context.write(new ImmutableBytesWritable(d3.getBytes()),
 put);
 
  }
 
  }
 
  SentimentCalculatorHBase - the Tool/main class :
  package com.hbase.mapreduce;
 
  import java.util.Calendar;
 
  import 

Re: Concurrent connections to Hbase

2013-09-05 Thread James Taylor
Hey Kiru,
The Phoenix team would be happy to work with you to benchmark your
performance if you can give us specifics about your schema design, queries,
and data sizes. We did something similar for Sudarshan for a Bloomberg use
case here[1].

Thanks,
James

[1]. http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/34697


On Thu, Sep 5, 2013 at 10:28 AM, Kiru Pakkirisamy kirupakkiris...@yahoo.com
 wrote:

 Hi All,
 I'd like to hear from users who are running a  big HBase setup with
 multiple concurrent connections.
 Woud like to know the -# of cores/machines, # of queries. Get/RPCs , Hbase
 version etc.
 We are trying to build an application with sub-second query performance
 (using coprocessors)  and want to scale it out to 10s of thousands of
 concurrent queries. We are now at 500-600 do see a bug like HBASE-9410
 Any positive/negative experiences in a similar situation ?


 Regards,
 - kiru


 Kiru Pakkirisamy | webcloudtech.wordpress.com


[ANN]: HBase-Writer 0.94.0 available for download

2013-09-05 Thread R Smith
The HBase-Writer team is happy to announce that HBase-Writer 0.94.0 is
available for download:

http://code.google.com/p/hbase-writer/downloads/list

HBase-Writer 0.94.0 is a maintenance release that fixes library
compatibility since older versions of Heritrix and HBase. More details may
be found on the HBase-Writer blog [1].

Users upgrading from previous versions of HBase-Writer are recommended to
upgrade to Heritrix 3.1.1 and HBase 0.95.2.

Thank you and Happy crawling :)

-The HBase-Writer Team

1.
http://opensourcemasters.com/weblog/general/hbase-writer-0.94.0-released.html


FILE_BYTES_READ counter missing for HBase mapreduce job

2013-09-05 Thread Haijia Zhou
Hi,
 Basically I have a mapreduce job to scan a hbase table and do some
processing. After the job finishes, I only got three filesystem counters:
HDFS_BYTES_READ, HDFS_BYTES_WRITTEN and FILE_BYTES_WRITTEN.
 The value of HDFS_BYTES_READ is not very useful here because it shows the
size of the .META file, not the size of input records.
 I am looking for counter FILE_BYTES_READ but somehow it's missing in the
job status report.

 Does anyone know what I might miss here?

 Thanks
Haijia

P.S. The job status report
 FileSystemCounters
HDFS_BYTES_READ
 340,124  0   340,124
FILE_BYTES_WRITTEN
190,431,329  0   190,431,329
HDFS_BYTES_WRITTEN
272,538,467,123  0   272,538,467,123


Re: FILE_BYTES_READ counter missing for HBase mapreduce job

2013-09-05 Thread Haijia Zhou
Addition info:
The mapreduce job I run is a map-only job. It does not have reducers and it
write data directly to hdfs in the mapper.
 Could this be the reason why there's no value for file_bytes_read?
 If so, is there any easy way to get the total input data size?

 Thanks
Haijia


On Thu, Sep 5, 2013 at 2:46 PM, Haijia Zhou leons...@gmail.com wrote:

 Hi,
  Basically I have a mapreduce job to scan a hbase table and do some
 processing. After the job finishes, I only got three filesystem counters:
 HDFS_BYTES_READ, HDFS_BYTES_WRITTEN and FILE_BYTES_WRITTEN.
  The value of HDFS_BYTES_READ is not very useful here because it shows the
 size of the .META file, not the size of input records.
  I am looking for counter FILE_BYTES_READ but somehow it's missing in the
 job status report.

  Does anyone know what I might miss here?

  Thanks
 Haijia

 P.S. The job status report
  FileSystemCounters
 HDFS_BYTES_READ
340,124  0   340,124
 FILE_BYTES_WRITTEN
 190,431,329  0   190,431,329
 HDFS_BYTES_WRITTEN
 272,538,467,123  0   272,538,467,123



Re: Programming practices for implementing composite row keys

2013-09-05 Thread Doug Meil

Greetings, 

Other food for thought on some case studies on composite rowkey design are
in the refguide:

http://hbase.apache.org/book.html#schema.casestudies






On 9/5/13 12:15 PM, Anoop John anoop.hb...@gmail.com wrote:

Hi
  Have a look at Phoenix[1].  There you can define a composite RK
model and it handles the -ve number ordering.  Also the scan model u
mentioned will be well supported with start/stop RK on entity1 and
using SkipScanFilter
for others.

-Anoop-

[1] https://github.com/forcedotcom/phoenix


On Thu, Sep 5, 2013 at 8:58 PM, Shahab Yunus shahab.yu...@gmail.com
wrote:

 Ah! I didn't know about HBASE-8693. Good information. Thanks Ted.

 Regards,
 Shahab


 On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote:

  For #2 and #4, see HBASE-8693 'DataType: provide extensible type API'
 which
  has been integrated to 0.96
 
  Cheers
 
 
  On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com
  wrote:
 
   My 2 cents:
  
   1- Yes, that is one way to do it. You can also use fixed length for
 every
   attribute participating in the composite key. HBase scan would be
more
   fitting to this pattern as well, I believe (?) It's a trade-off
 basically
   between space (all that padding increasing the key size) versus
   complexities involved in deciding and handling a delimiter and
 consequent
   parsing of keys etc.
  
   2- I personally have not heard about this. As far as I understand,
this
   goes against the whole idea of HBase scanning and prefix and fuzzy
  filters
   will not be possible this way. This should not be followed.
  
   3- See replies to 1  2
  
   4- The sorting of the keys, by default, is binary comparator. It is
a
 bit
   tricky as far as I know and the last I checked. Some tips here:
  
  
 
 
http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for
-negative-integers
  
   Can you normalize them (or take an absolute) before reading and
writing
  (of
   course at the cost of performance) if it is possible i.e. keys with
 same
   amount but different magnitude cannot exist as well as different
  entities.
   This depends on your business logic and type/nature of data.
  
   Regards,
   Shahab
  
  
   On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar 
 praveen...@gmail.com
   wrote:
  
Hello people,
   
I have a scenario which requires creating composite row keys for
my
  hbase
table.
   
Basically it would be entity1,entity2,entity3.
   
Search would be based by entity1 and then entity2 and 3.. I know I
 can
  do
row start-stopscan on entity1 first and then put row filters on
  entity2
and entity3.
   
My question is what are the best programming principles to
implement
   these
keys.
   
1. Just use simple delimiters entity1:entity2:entity3.
   
2. Create complex datatypes like java structures. I don't know if
  anyone
uses structures as keys and if they do, can someone please
highlight
 me
   for
which scenarios they would be good fit. Does they fit good for
this
scenario.
   
3. What are the pros and cons for both 1 and 2, when it comes for
 data
retrieval.
   
4. My entity1 can be negative also. Does it make any special
  difference
when hbase ordering is concerned. How can I tackle this scenario.
   
Any help on how to implement composite row keys would be highly
  helpful.
   I
want to understand how the community deals with implementing
 composite
   row
keys.
   
Regards
Praveenesh
   
  
 




Re: Suggestion need on desinging Flatten table for HBase given scenario

2013-09-05 Thread Doug Meil

Greetings,

The refguide has some case studies on composite rowkey design that might be 
helpful.

http://hbase.apache.org/book.html#schema.casestudies



From: Ramasubramanian Narayanan 
ramasubramanian.naraya...@gmail.commailto:ramasubramanian.naraya...@gmail.com
Reply-To: user@hbase.apache.orgmailto:user@hbase.apache.org 
user@hbase.apache.orgmailto:user@hbase.apache.org
Date: Thursday, September 5, 2013 1:05 AM
To: user@hbase.apache.orgmailto:user@hbase.apache.org 
user@hbase.apache.orgmailto:user@hbase.apache.org
Subject: Suggestion need on desinging Flatten table for HBase given scenario


Dear All,


For the below 1 to many relationship column sets, require suggestion on how to 
design a Flatten HBase table... Kindly refer the attached image for the 
scenario...

Pls let me know if my scenario is not clearly explained...

regards,
Rams



Re: Programming practices for implementing composite row keys

2013-09-05 Thread Anoop John
Hi
  Have a look at Phoenix[1].  There you can define a composite RK
model and it handles the -ve number ordering.  Also the scan model u
mentioned will be well supported with start/stop RK on entity1 and
using SkipScanFilter
for others.

-Anoop-

[1] https://github.com/forcedotcom/phoenix


On Thu, Sep 5, 2013 at 8:58 PM, Shahab Yunus shahab.yu...@gmail.com wrote:

 Ah! I didn't know about HBASE-8693. Good information. Thanks Ted.

 Regards,
 Shahab


 On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote:

  For #2 and #4, see HBASE-8693 'DataType: provide extensible type API'
 which
  has been integrated to 0.96
 
  Cheers
 
 
  On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com
  wrote:
 
   My 2 cents:
  
   1- Yes, that is one way to do it. You can also use fixed length for
 every
   attribute participating in the composite key. HBase scan would be more
   fitting to this pattern as well, I believe (?) It's a trade-off
 basically
   between space (all that padding increasing the key size) versus
   complexities involved in deciding and handling a delimiter and
 consequent
   parsing of keys etc.
  
   2- I personally have not heard about this. As far as I understand, this
   goes against the whole idea of HBase scanning and prefix and fuzzy
  filters
   will not be possible this way. This should not be followed.
  
   3- See replies to 1  2
  
   4- The sorting of the keys, by default, is binary comparator. It is a
 bit
   tricky as far as I know and the last I checked. Some tips here:
  
  
 
 http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers
  
   Can you normalize them (or take an absolute) before reading and writing
  (of
   course at the cost of performance) if it is possible i.e. keys with
 same
   amount but different magnitude cannot exist as well as different
  entities.
   This depends on your business logic and type/nature of data.
  
   Regards,
   Shahab
  
  
   On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar 
 praveen...@gmail.com
   wrote:
  
Hello people,
   
I have a scenario which requires creating composite row keys for my
  hbase
table.
   
Basically it would be entity1,entity2,entity3.
   
Search would be based by entity1 and then entity2 and 3.. I know I
 can
  do
row start-stopscan on entity1 first and then put row filters on
  entity2
and entity3.
   
My question is what are the best programming principles to implement
   these
keys.
   
1. Just use simple delimiters entity1:entity2:entity3.
   
2. Create complex datatypes like java structures. I don't know if
  anyone
uses structures as keys and if they do, can someone please highlight
 me
   for
which scenarios they would be good fit. Does they fit good for this
scenario.
   
3. What are the pros and cons for both 1 and 2, when it comes for
 data
retrieval.
   
4. My entity1 can be negative also. Does it make any special
  difference
when hbase ordering is concerned. How can I tackle this scenario.
   
Any help on how to implement composite row keys would be highly
  helpful.
   I
want to understand how the community deals with implementing
 composite
   row
keys.
   
Regards
Praveenesh
   
  
 



what's different between numberOfStores and numberOfStorefiles in region status variables

2013-09-05 Thread ch huang
hi all:

  i check the region server status though
http://IP:60030/rs-statushttp://ip:60030/rs-status
and see for some region ,the two variables is not always same,i wonder that
what's different between them?

numberOfStores=1,
numberOfStorefiles=3


Re: what's different between numberOfStores and numberOfStorefiles in region status variables

2013-09-05 Thread lars hofhansl
Each column family is a store (in fact there is one store per region and 
column families). Each region may have more than one actual HFile per store.
A new HFile (storefile) is create for example when the memstore is flushed to 
disk. When too many HFiles have accumulated for a store, they are compacted to 
fewer, larger files.

-- Lars



- Original Message -
From: ch huang justlo...@gmail.com
To: user@hbase.apache.org
Cc: 
Sent: Thursday, September 5, 2013 10:29 PM
Subject: what's different between numberOfStores and numberOfStorefiles in 
region status variables

hi all:

  i check the region server status though
http://IP:60030/rs-statushttp://ip:60030/rs-status
and see for some region ,the two variables is not always same,i wonder that
what's different between them?

numberOfStores=1,
numberOfStorefiles=3