HBase MR - key/value mismatch
I'm trying to execute a MR code over stand-alone HBase(0.94.11). I had read the HBase api and modified my MR code to read data and getting exceptions in the Reduce phase. The exception I get is : 13/09/05 16:16:17 INFO mapred.JobClient: map 0% reduce 0% 13/09/05 16:23:31 INFO mapred.JobClient: Task Id : attempt_201309051437_0005_m_00_0, Status : FAILED java.io.IOException: wrong key class: class org.apache.hadoop.hbase.io.ImmutableBytesWritable is not class org.apache.hadoop.io.Text at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1168) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1492) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:199) at com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:1) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1513) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Providing the partial(excluding the business logic) codes Mapper: public class SentimentCalculationHBaseMapper extends TableMapperText, Text { private Text sentenseOriginal = new Text(); private Text sentenseParsed = new Text(); @Override protected void map( ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.MapperImmutableBytesWritable, Result, Text, Text.Context context) throws IOException, InterruptedException { context.write(this.sentenseOriginal, this.sentenseParsed); } } Reducer : public class SentimentCalculationHBaseReducer extends TableReducerText, Text, ImmutableBytesWritable { @Override protected void reduce( Text key, java.lang.IterableText values, org.apache.hadoop.mapreduce.ReducerText, Text, ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context) throws IOException, InterruptedException { Double mdblSentimentOverall = 0.0; String d3 = key + @12321@ + s11.replaceFirst(:::, ) + @12321@ + mstpositiveWords + @12321@ + mstnegativeWords + @12321@ + mstneutralWords; System.out.println(d3 : + d3 + , mdblSentimentOverall : + mdblSentimentOverall); Put put = new Put(d3.getBytes()); put.add(Bytes.toBytes(word_attributes), Bytes.toBytes(mdblSentimentOverall), Bytes.toBytes(mdblSentimentOverall)); System.out.println(Context is + context); context.write(new ImmutableBytesWritable(d3.getBytes()), put); } } SentimentCalculatorHBase - the Tool/main class : package com.hbase.mapreduce; import java.util.Calendar; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableInputFormat; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.hbase.mapreduce.TableOutputFormat; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class SentimentCalculatorHBase extends Configured implements Tool { /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub SentimentCalculatorHBase sentimentCalculatorHBase = new SentimentCalculatorHBase(); ToolRunner.run(sentimentCalculatorHBase, args); } @Override public int run(String[] arg0) throws Exception { // TODO Auto-generated method stub System.out .println(***Configuration
Re: Suggestion need on desinging Flatten table for HBase given scenario
The attachment in your original email didn't go through. Please put it on some website so that everyone can see it. Thanks On Sep 4, 2013, at 10:24 PM, Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com wrote: Hi Have shared to you in Google + Can't you see that picture as an attachment in my earlier mail? Request you to confirm else I will resent the mail with attachment... regards, Rams On Thu, Sep 5, 2013 at 10:39 AM, Ted Yu yuzhih...@gmail.com wrote: I don't see image. Can you upload to some website ? Thanks On Sep 4, 2013, at 10:05 PM, Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.com wrote: Dear All, For the below 1 to many relationship column sets, require suggestion on how to design a Flatten HBase table... Kindly refer the attached image for the scenario... Pls let me know if my scenario is not clearly explained... regards, Rams
Re: HBase MR - key/value mismatch
Try using Bytes.toBytes(your string) rather than String.getBytes. Regards, Shahab On Thu, Sep 5, 2013 at 2:16 AM, Omkar Joshi omkar.jo...@lntinfotech.comwrote: I'm trying to execute a MR code over stand-alone HBase(0.94.11). I had read the HBase api and modified my MR code to read data and getting exceptions in the Reduce phase. The exception I get is : 13/09/05 16:16:17 INFO mapred.JobClient: map 0% reduce 0% 13/09/05 16:23:31 INFO mapred.JobClient: Task Id : attempt_201309051437_0005_m_00_0, Status : FAILED java.io.IOException: wrong key class: class org.apache.hadoop.hbase.io.ImmutableBytesWritable is not class org.apache.hadoop.io.Text at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1168) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1492) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:199) at com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:1) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1513) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Providing the partial(excluding the business logic) codes Mapper: public class SentimentCalculationHBaseMapper extends TableMapperText, Text { private Text sentenseOriginal = new Text(); private Text sentenseParsed = new Text(); @Override protected void map( ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.MapperImmutableBytesWritable, Result, Text, Text.Context context) throws IOException, InterruptedException { context.write(this.sentenseOriginal, this.sentenseParsed); } } Reducer : public class SentimentCalculationHBaseReducer extends TableReducerText, Text, ImmutableBytesWritable { @Override protected void reduce( Text key, java.lang.IterableText values, org.apache.hadoop.mapreduce.ReducerText, Text, ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context) throws IOException, InterruptedException { Double mdblSentimentOverall = 0.0; String d3 = key + @12321@ + s11.replaceFirst(:::, ) + @12321@ + mstpositiveWords + @12321@ + mstnegativeWords + @12321@ + mstneutralWords; System.out.println(d3 : + d3 + , mdblSentimentOverall : + mdblSentimentOverall); Put put = new Put(d3.getBytes()); put.add(Bytes.toBytes(word_attributes), Bytes.toBytes(mdblSentimentOverall), Bytes.toBytes(mdblSentimentOverall)); System.out.println(Context is + context); context.write(new ImmutableBytesWritable(d3.getBytes()), put); } } SentimentCalculatorHBase - the Tool/main class : package com.hbase.mapreduce; import java.util.Calendar; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableInputFormat; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.hbase.mapreduce.TableOutputFormat; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class SentimentCalculatorHBase extends Configured implements Tool { /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub SentimentCalculatorHBase sentimentCalculatorHBase = new SentimentCalculatorHBase();
Re: user action modeling
Your read queries seem to be more driven form the 'action' and 'object' perspective, rather than user. 1- So one option is that you make a composite key with action and object: action|object and the columns are users who are generating events on this combination. You can scan using prefix filter if you want to look at data specific set of action and object i.e. your requirements 1, 3 4. Key distribution should be OK too. The drawbacks here are that a) you can end up with really wide rows b) what if you want to store more information than just user id in the columns? The friends part is not that trivial and you have to maintain that relationship out of this main table or create complex composite entities (I need to think about it more, HBase is not a graph database.) Regards, Shahab On Thu, Sep 5, 2013 at 1:16 AM, Marcos Sousa marcoscaixetaso...@gmail.comwrote: Hi, I'm working with HBase since the last 3 moths, now I have to store user actions, at first look, using Hbase. I have a limited number of actions, thousands of objects and about 50 million users interacting with them, around 2 billion interactions per month. I have to answer there questions: How many users performed action 'foo' at object 'bar' What friends performed performed action 'foo' at object 'bar' What users made 'foo' at object 'bar' last week. What objects received more action 'foo' Does anybody have suggestions to a schema for this problem? Best regards, -- Marcos Sousa
Programming practices for implementing composite row keys
Hello people, I have a scenario which requires creating composite row keys for my hbase table. Basically it would be entity1,entity2,entity3. Search would be based by entity1 and then entity2 and 3.. I know I can do row start-stopscan on entity1 first and then put row filters on entity2 and entity3. My question is what are the best programming principles to implement these keys. 1. Just use simple delimiters entity1:entity2:entity3. 2. Create complex datatypes like java structures. I don't know if anyone uses structures as keys and if they do, can someone please highlight me for which scenarios they would be good fit. Does they fit good for this scenario. 3. What are the pros and cons for both 1 and 2, when it comes for data retrieval. 4. My entity1 can be negative also. Does it make any special difference when hbase ordering is concerned. How can I tackle this scenario. Any help on how to implement composite row keys would be highly helpful. I want to understand how the community deals with implementing composite row keys. Regards Praveenesh
Re: HBase MR - key/value mismatch
public class SentimentCalculationHBaseReducer extends TableReducerText, Text, ImmutableBytesWritable { The first type parameter for reducer should be ImmutableBytesWritable Cheers On Wed, Sep 4, 2013 at 11:16 PM, Omkar Joshi omkar.jo...@lntinfotech.comwrote: I'm trying to execute a MR code over stand-alone HBase(0.94.11). I had read the HBase api and modified my MR code to read data and getting exceptions in the Reduce phase. The exception I get is : 13/09/05 16:16:17 INFO mapred.JobClient: map 0% reduce 0% 13/09/05 16:23:31 INFO mapred.JobClient: Task Id : attempt_201309051437_0005_m_00_0, Status : FAILED java.io.IOException: wrong key class: class org.apache.hadoop.hbase.io.ImmutableBytesWritable is not class org.apache.hadoop.io.Text at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1168) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1492) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:199) at com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:1) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1513) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Providing the partial(excluding the business logic) codes Mapper: public class SentimentCalculationHBaseMapper extends TableMapperText, Text { private Text sentenseOriginal = new Text(); private Text sentenseParsed = new Text(); @Override protected void map( ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.MapperImmutableBytesWritable, Result, Text, Text.Context context) throws IOException, InterruptedException { context.write(this.sentenseOriginal, this.sentenseParsed); } } Reducer : public class SentimentCalculationHBaseReducer extends TableReducerText, Text, ImmutableBytesWritable { @Override protected void reduce( Text key, java.lang.IterableText values, org.apache.hadoop.mapreduce.ReducerText, Text, ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context) throws IOException, InterruptedException { Double mdblSentimentOverall = 0.0; String d3 = key + @12321@ + s11.replaceFirst(:::, ) + @12321@ + mstpositiveWords + @12321@ + mstnegativeWords + @12321@ + mstneutralWords; System.out.println(d3 : + d3 + , mdblSentimentOverall : + mdblSentimentOverall); Put put = new Put(d3.getBytes()); put.add(Bytes.toBytes(word_attributes), Bytes.toBytes(mdblSentimentOverall), Bytes.toBytes(mdblSentimentOverall)); System.out.println(Context is + context); context.write(new ImmutableBytesWritable(d3.getBytes()), put); } } SentimentCalculatorHBase - the Tool/main class : package com.hbase.mapreduce; import java.util.Calendar; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.hbase.mapreduce.TableInputFormat; import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; import org.apache.hadoop.hbase.mapreduce.TableOutputFormat; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class SentimentCalculatorHBase extends Configured implements Tool { /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub
Re: user action modeling
Hi, Yes, that the point, I need to save dynamic parameters for each action :( I was thinking about, distributing the data in 3 tables: - users: which I have all data about user and the list of friends and documents that he performed the action - user_actions: to save action and futher parameters - objects: save the list of users who performed the action. Replicating data like that, I will have 3 times more writing operations. Curiously, the intersect part, aka friends who did the same action, is one of the most used part. Regards, Marcos Sousa On Thu, Sep 5, 2013 at 10:55 AM, Shahab Yunus shahab.yu...@gmail.comwrote: Your read queries seem to be more driven form the 'action' and 'object' perspective, rather than user. 1- So one option is that you make a composite key with action and object: action|object and the columns are users who are generating events on this combination. You can scan using prefix filter if you want to look at data specific set of action and object i.e. your requirements 1, 3 4. Key distribution should be OK too. The drawbacks here are that a) you can end up with really wide rows b) what if you want to store more information than just user id in the columns? The friends part is not that trivial and you have to maintain that relationship out of this main table or create complex composite entities (I need to think about it more, HBase is not a graph database.) Regards, Shahab On Thu, Sep 5, 2013 at 1:16 AM, Marcos Sousa marcoscaixetaso...@gmail.comwrote: Hi, I'm working with HBase since the last 3 moths, now I have to store user actions, at first look, using Hbase. I have a limited number of actions, thousands of objects and about 50 million users interacting with them, around 2 billion interactions per month. I have to answer there questions: How many users performed action 'foo' at object 'bar' What friends performed performed action 'foo' at object 'bar' What users made 'foo' at object 'bar' last week. What objects received more action 'foo' Does anybody have suggestions to a schema for this problem? Best regards, -- Marcos Sousa
Re: Programming practices for implementing composite row keys
For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which has been integrated to 0.96 Cheers On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote: My 2 cents: 1- Yes, that is one way to do it. You can also use fixed length for every attribute participating in the composite key. HBase scan would be more fitting to this pattern as well, I believe (?) It's a trade-off basically between space (all that padding increasing the key size) versus complexities involved in deciding and handling a delimiter and consequent parsing of keys etc. 2- I personally have not heard about this. As far as I understand, this goes against the whole idea of HBase scanning and prefix and fuzzy filters will not be possible this way. This should not be followed. 3- See replies to 1 2 4- The sorting of the keys, by default, is binary comparator. It is a bit tricky as far as I know and the last I checked. Some tips here: http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers Can you normalize them (or take an absolute) before reading and writing (of course at the cost of performance) if it is possible i.e. keys with same amount but different magnitude cannot exist as well as different entities. This depends on your business logic and type/nature of data. Regards, Shahab On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com wrote: Hello people, I have a scenario which requires creating composite row keys for my hbase table. Basically it would be entity1,entity2,entity3. Search would be based by entity1 and then entity2 and 3.. I know I can do row start-stopscan on entity1 first and then put row filters on entity2 and entity3. My question is what are the best programming principles to implement these keys. 1. Just use simple delimiters entity1:entity2:entity3. 2. Create complex datatypes like java structures. I don't know if anyone uses structures as keys and if they do, can someone please highlight me for which scenarios they would be good fit. Does they fit good for this scenario. 3. What are the pros and cons for both 1 and 2, when it comes for data retrieval. 4. My entity1 can be negative also. Does it make any special difference when hbase ordering is concerned. How can I tackle this scenario. Any help on how to implement composite row keys would be highly helpful. I want to understand how the community deals with implementing composite row keys. Regards Praveenesh
Re: Programming practices for implementing composite row keys
Ah! I didn't know about HBASE-8693. Good information. Thanks Ted. Regards, Shahab On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote: For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which has been integrated to 0.96 Cheers On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote: My 2 cents: 1- Yes, that is one way to do it. You can also use fixed length for every attribute participating in the composite key. HBase scan would be more fitting to this pattern as well, I believe (?) It's a trade-off basically between space (all that padding increasing the key size) versus complexities involved in deciding and handling a delimiter and consequent parsing of keys etc. 2- I personally have not heard about this. As far as I understand, this goes against the whole idea of HBase scanning and prefix and fuzzy filters will not be possible this way. This should not be followed. 3- See replies to 1 2 4- The sorting of the keys, by default, is binary comparator. It is a bit tricky as far as I know and the last I checked. Some tips here: http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers Can you normalize them (or take an absolute) before reading and writing (of course at the cost of performance) if it is possible i.e. keys with same amount but different magnitude cannot exist as well as different entities. This depends on your business logic and type/nature of data. Regards, Shahab On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com wrote: Hello people, I have a scenario which requires creating composite row keys for my hbase table. Basically it would be entity1,entity2,entity3. Search would be based by entity1 and then entity2 and 3.. I know I can do row start-stopscan on entity1 first and then put row filters on entity2 and entity3. My question is what are the best programming principles to implement these keys. 1. Just use simple delimiters entity1:entity2:entity3. 2. Create complex datatypes like java structures. I don't know if anyone uses structures as keys and if they do, can someone please highlight me for which scenarios they would be good fit. Does they fit good for this scenario. 3. What are the pros and cons for both 1 and 2, when it comes for data retrieval. 4. My entity1 can be negative also. Does it make any special difference when hbase ordering is concerned. How can I tackle this scenario. Any help on how to implement composite row keys would be highly helpful. I want to understand how the community deals with implementing composite row keys. Regards Praveenesh
Re: HBase MR - key/value mismatch
The reducer also serves as combiner whose output would be sent to reducer. org.apache.hadoop.mapreduce.ReducerText, Text, ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context) So the type parameters above should facilitate this. Take a look at the PutCombiner from HBase source code: public class PutCombinerK extends ReducerK, Put, K, Put { Cheers On Thu, Sep 5, 2013 at 9:46 AM, Shahab Yunus shahab.yu...@gmail.com wrote: Ted, Might be a something very basic that I am missing but why should OP's reducer's key be of type ImmutableBytesWritable if he is emitting Text in the mapper? Thanks. protected void map( ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.MapperImmutableBytesWritable, Result, Text, Text.Context context) throws IOException, InterruptedException { context.write(this.sentenseOriginal, this.sentenseParsed); //sentenseOriginal is Text Regards, Shahab On Thu, Sep 5, 2013 at 10:34 AM, Ted Yu yuzhih...@gmail.com wrote: public class SentimentCalculationHBaseReducer extends TableReducerText, Text, ImmutableBytesWritable { The first type parameter for reducer should be ImmutableBytesWritable Cheers On Wed, Sep 4, 2013 at 11:16 PM, Omkar Joshi omkar.jo...@lntinfotech.com wrote: I'm trying to execute a MR code over stand-alone HBase(0.94.11). I had read the HBase api and modified my MR code to read data and getting exceptions in the Reduce phase. The exception I get is : 13/09/05 16:16:17 INFO mapred.JobClient: map 0% reduce 0% 13/09/05 16:23:31 INFO mapred.JobClient: Task Id : attempt_201309051437_0005_m_00_0, Status : FAILED java.io.IOException: wrong key class: class org.apache.hadoop.hbase.io.ImmutableBytesWritable is not class org.apache.hadoop.io.Text at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1168) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1492) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:199) at com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:1) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1513) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Providing the partial(excluding the business logic) codes Mapper: public class SentimentCalculationHBaseMapper extends TableMapperText, Text { private Text sentenseOriginal = new Text(); private Text sentenseParsed = new Text(); @Override protected void map( ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.MapperImmutableBytesWritable, Result, Text, Text.Context context) throws IOException, InterruptedException { context.write(this.sentenseOriginal, this.sentenseParsed); } } Reducer : public class SentimentCalculationHBaseReducer extends TableReducerText, Text, ImmutableBytesWritable { @Override protected void reduce( Text key, java.lang.IterableText values, org.apache.hadoop.mapreduce.ReducerText, Text, ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context) throws IOException, InterruptedException { Double mdblSentimentOverall = 0.0; String d3 = key + @12321@ + s11.replaceFirst(:::, ) + @12321@ + mstpositiveWords + @12321@ + mstnegativeWords + @12321@ + mstneutralWords;
Concurrent connections to Hbase
Hi All, I'd like to hear from users who are running a big HBase setup with multiple concurrent connections. Woud like to know the -# of cores/machines, # of queries. Get/RPCs , Hbase version etc. We are trying to build an application with sub-second query performance (using coprocessors) and want to scale it out to 10s of thousands of concurrent queries. We are now at 500-600 do see a bug like HBASE-9410 Any positive/negative experiences in a similar situation ? Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com
Re: HBase MR - key/value mismatch
Ted, Might be a something very basic that I am missing but why should OP's reducer's key be of type ImmutableBytesWritable if he is emitting Text in the mapper? Thanks. protected void map( ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.MapperImmutableBytesWritable, Result, Text, Text.Context context) throws IOException, InterruptedException { context.write(this.sentenseOriginal, this.sentenseParsed); //sentenseOriginal is Text Regards, Shahab On Thu, Sep 5, 2013 at 10:34 AM, Ted Yu yuzhih...@gmail.com wrote: public class SentimentCalculationHBaseReducer extends TableReducerText, Text, ImmutableBytesWritable { The first type parameter for reducer should be ImmutableBytesWritable Cheers On Wed, Sep 4, 2013 at 11:16 PM, Omkar Joshi omkar.jo...@lntinfotech.com wrote: I'm trying to execute a MR code over stand-alone HBase(0.94.11). I had read the HBase api and modified my MR code to read data and getting exceptions in the Reduce phase. The exception I get is : 13/09/05 16:16:17 INFO mapred.JobClient: map 0% reduce 0% 13/09/05 16:23:31 INFO mapred.JobClient: Task Id : attempt_201309051437_0005_m_00_0, Status : FAILED java.io.IOException: wrong key class: class org.apache.hadoop.hbase.io.ImmutableBytesWritable is not class org.apache.hadoop.io.Text at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:164) at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1168) at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1492) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:199) at com.hbase.mapreduce.SentimentCalculationHBaseReducer.reduce(SentimentCalculationHBaseReducer.java:1) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1513) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249) Providing the partial(excluding the business logic) codes Mapper: public class SentimentCalculationHBaseMapper extends TableMapperText, Text { private Text sentenseOriginal = new Text(); private Text sentenseParsed = new Text(); @Override protected void map( ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.MapperImmutableBytesWritable, Result, Text, Text.Context context) throws IOException, InterruptedException { context.write(this.sentenseOriginal, this.sentenseParsed); } } Reducer : public class SentimentCalculationHBaseReducer extends TableReducerText, Text, ImmutableBytesWritable { @Override protected void reduce( Text key, java.lang.IterableText values, org.apache.hadoop.mapreduce.ReducerText, Text, ImmutableBytesWritable, org.apache.hadoop.io.Writable.Context context) throws IOException, InterruptedException { Double mdblSentimentOverall = 0.0; String d3 = key + @12321@ + s11.replaceFirst(:::, ) + @12321@ + mstpositiveWords + @12321@ + mstnegativeWords + @12321@ + mstneutralWords; System.out.println(d3 : + d3 + , mdblSentimentOverall : + mdblSentimentOverall); Put put = new Put(d3.getBytes()); put.add(Bytes.toBytes(word_attributes), Bytes.toBytes(mdblSentimentOverall), Bytes.toBytes(mdblSentimentOverall)); System.out.println(Context is + context); context.write(new ImmutableBytesWritable(d3.getBytes()), put); } } SentimentCalculatorHBase - the Tool/main class : package com.hbase.mapreduce; import java.util.Calendar; import
Re: Concurrent connections to Hbase
Hey Kiru, The Phoenix team would be happy to work with you to benchmark your performance if you can give us specifics about your schema design, queries, and data sizes. We did something similar for Sudarshan for a Bloomberg use case here[1]. Thanks, James [1]. http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/34697 On Thu, Sep 5, 2013 at 10:28 AM, Kiru Pakkirisamy kirupakkiris...@yahoo.com wrote: Hi All, I'd like to hear from users who are running a big HBase setup with multiple concurrent connections. Woud like to know the -# of cores/machines, # of queries. Get/RPCs , Hbase version etc. We are trying to build an application with sub-second query performance (using coprocessors) and want to scale it out to 10s of thousands of concurrent queries. We are now at 500-600 do see a bug like HBASE-9410 Any positive/negative experiences in a similar situation ? Regards, - kiru Kiru Pakkirisamy | webcloudtech.wordpress.com
[ANN]: HBase-Writer 0.94.0 available for download
The HBase-Writer team is happy to announce that HBase-Writer 0.94.0 is available for download: http://code.google.com/p/hbase-writer/downloads/list HBase-Writer 0.94.0 is a maintenance release that fixes library compatibility since older versions of Heritrix and HBase. More details may be found on the HBase-Writer blog [1]. Users upgrading from previous versions of HBase-Writer are recommended to upgrade to Heritrix 3.1.1 and HBase 0.95.2. Thank you and Happy crawling :) -The HBase-Writer Team 1. http://opensourcemasters.com/weblog/general/hbase-writer-0.94.0-released.html
FILE_BYTES_READ counter missing for HBase mapreduce job
Hi, Basically I have a mapreduce job to scan a hbase table and do some processing. After the job finishes, I only got three filesystem counters: HDFS_BYTES_READ, HDFS_BYTES_WRITTEN and FILE_BYTES_WRITTEN. The value of HDFS_BYTES_READ is not very useful here because it shows the size of the .META file, not the size of input records. I am looking for counter FILE_BYTES_READ but somehow it's missing in the job status report. Does anyone know what I might miss here? Thanks Haijia P.S. The job status report FileSystemCounters HDFS_BYTES_READ 340,124 0 340,124 FILE_BYTES_WRITTEN 190,431,329 0 190,431,329 HDFS_BYTES_WRITTEN 272,538,467,123 0 272,538,467,123
Re: FILE_BYTES_READ counter missing for HBase mapreduce job
Addition info: The mapreduce job I run is a map-only job. It does not have reducers and it write data directly to hdfs in the mapper. Could this be the reason why there's no value for file_bytes_read? If so, is there any easy way to get the total input data size? Thanks Haijia On Thu, Sep 5, 2013 at 2:46 PM, Haijia Zhou leons...@gmail.com wrote: Hi, Basically I have a mapreduce job to scan a hbase table and do some processing. After the job finishes, I only got three filesystem counters: HDFS_BYTES_READ, HDFS_BYTES_WRITTEN and FILE_BYTES_WRITTEN. The value of HDFS_BYTES_READ is not very useful here because it shows the size of the .META file, not the size of input records. I am looking for counter FILE_BYTES_READ but somehow it's missing in the job status report. Does anyone know what I might miss here? Thanks Haijia P.S. The job status report FileSystemCounters HDFS_BYTES_READ 340,124 0 340,124 FILE_BYTES_WRITTEN 190,431,329 0 190,431,329 HDFS_BYTES_WRITTEN 272,538,467,123 0 272,538,467,123
Re: Programming practices for implementing composite row keys
Greetings, Other food for thought on some case studies on composite rowkey design are in the refguide: http://hbase.apache.org/book.html#schema.casestudies On 9/5/13 12:15 PM, Anoop John anoop.hb...@gmail.com wrote: Hi Have a look at Phoenix[1]. There you can define a composite RK model and it handles the -ve number ordering. Also the scan model u mentioned will be well supported with start/stop RK on entity1 and using SkipScanFilter for others. -Anoop- [1] https://github.com/forcedotcom/phoenix On Thu, Sep 5, 2013 at 8:58 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Ah! I didn't know about HBASE-8693. Good information. Thanks Ted. Regards, Shahab On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote: For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which has been integrated to 0.96 Cheers On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote: My 2 cents: 1- Yes, that is one way to do it. You can also use fixed length for every attribute participating in the composite key. HBase scan would be more fitting to this pattern as well, I believe (?) It's a trade-off basically between space (all that padding increasing the key size) versus complexities involved in deciding and handling a delimiter and consequent parsing of keys etc. 2- I personally have not heard about this. As far as I understand, this goes against the whole idea of HBase scanning and prefix and fuzzy filters will not be possible this way. This should not be followed. 3- See replies to 1 2 4- The sorting of the keys, by default, is binary comparator. It is a bit tricky as far as I know and the last I checked. Some tips here: http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for -negative-integers Can you normalize them (or take an absolute) before reading and writing (of course at the cost of performance) if it is possible i.e. keys with same amount but different magnitude cannot exist as well as different entities. This depends on your business logic and type/nature of data. Regards, Shahab On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com wrote: Hello people, I have a scenario which requires creating composite row keys for my hbase table. Basically it would be entity1,entity2,entity3. Search would be based by entity1 and then entity2 and 3.. I know I can do row start-stopscan on entity1 first and then put row filters on entity2 and entity3. My question is what are the best programming principles to implement these keys. 1. Just use simple delimiters entity1:entity2:entity3. 2. Create complex datatypes like java structures. I don't know if anyone uses structures as keys and if they do, can someone please highlight me for which scenarios they would be good fit. Does they fit good for this scenario. 3. What are the pros and cons for both 1 and 2, when it comes for data retrieval. 4. My entity1 can be negative also. Does it make any special difference when hbase ordering is concerned. How can I tackle this scenario. Any help on how to implement composite row keys would be highly helpful. I want to understand how the community deals with implementing composite row keys. Regards Praveenesh
Re: Suggestion need on desinging Flatten table for HBase given scenario
Greetings, The refguide has some case studies on composite rowkey design that might be helpful. http://hbase.apache.org/book.html#schema.casestudies From: Ramasubramanian Narayanan ramasubramanian.naraya...@gmail.commailto:ramasubramanian.naraya...@gmail.com Reply-To: user@hbase.apache.orgmailto:user@hbase.apache.org user@hbase.apache.orgmailto:user@hbase.apache.org Date: Thursday, September 5, 2013 1:05 AM To: user@hbase.apache.orgmailto:user@hbase.apache.org user@hbase.apache.orgmailto:user@hbase.apache.org Subject: Suggestion need on desinging Flatten table for HBase given scenario Dear All, For the below 1 to many relationship column sets, require suggestion on how to design a Flatten HBase table... Kindly refer the attached image for the scenario... Pls let me know if my scenario is not clearly explained... regards, Rams
Re: Programming practices for implementing composite row keys
Hi Have a look at Phoenix[1]. There you can define a composite RK model and it handles the -ve number ordering. Also the scan model u mentioned will be well supported with start/stop RK on entity1 and using SkipScanFilter for others. -Anoop- [1] https://github.com/forcedotcom/phoenix On Thu, Sep 5, 2013 at 8:58 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Ah! I didn't know about HBASE-8693. Good information. Thanks Ted. Regards, Shahab On Thu, Sep 5, 2013 at 10:53 AM, Ted Yu yuzhih...@gmail.com wrote: For #2 and #4, see HBASE-8693 'DataType: provide extensible type API' which has been integrated to 0.96 Cheers On Thu, Sep 5, 2013 at 7:14 AM, Shahab Yunus shahab.yu...@gmail.com wrote: My 2 cents: 1- Yes, that is one way to do it. You can also use fixed length for every attribute participating in the composite key. HBase scan would be more fitting to this pattern as well, I believe (?) It's a trade-off basically between space (all that padding increasing the key size) versus complexities involved in deciding and handling a delimiter and consequent parsing of keys etc. 2- I personally have not heard about this. As far as I understand, this goes against the whole idea of HBase scanning and prefix and fuzzy filters will not be possible this way. This should not be followed. 3- See replies to 1 2 4- The sorting of the keys, by default, is binary comparator. It is a bit tricky as far as I know and the last I checked. Some tips here: http://stackoverflow.com/questions/17248510/hbase-filters-not-working-for-negative-integers Can you normalize them (or take an absolute) before reading and writing (of course at the cost of performance) if it is possible i.e. keys with same amount but different magnitude cannot exist as well as different entities. This depends on your business logic and type/nature of data. Regards, Shahab On Thu, Sep 5, 2013 at 10:03 AM, praveenesh kumar praveen...@gmail.com wrote: Hello people, I have a scenario which requires creating composite row keys for my hbase table. Basically it would be entity1,entity2,entity3. Search would be based by entity1 and then entity2 and 3.. I know I can do row start-stopscan on entity1 first and then put row filters on entity2 and entity3. My question is what are the best programming principles to implement these keys. 1. Just use simple delimiters entity1:entity2:entity3. 2. Create complex datatypes like java structures. I don't know if anyone uses structures as keys and if they do, can someone please highlight me for which scenarios they would be good fit. Does they fit good for this scenario. 3. What are the pros and cons for both 1 and 2, when it comes for data retrieval. 4. My entity1 can be negative also. Does it make any special difference when hbase ordering is concerned. How can I tackle this scenario. Any help on how to implement composite row keys would be highly helpful. I want to understand how the community deals with implementing composite row keys. Regards Praveenesh
what's different between numberOfStores and numberOfStorefiles in region status variables
hi all: i check the region server status though http://IP:60030/rs-statushttp://ip:60030/rs-status and see for some region ,the two variables is not always same,i wonder that what's different between them? numberOfStores=1, numberOfStorefiles=3
Re: what's different between numberOfStores and numberOfStorefiles in region status variables
Each column family is a store (in fact there is one store per region and column families). Each region may have more than one actual HFile per store. A new HFile (storefile) is create for example when the memstore is flushed to disk. When too many HFiles have accumulated for a store, they are compacted to fewer, larger files. -- Lars - Original Message - From: ch huang justlo...@gmail.com To: user@hbase.apache.org Cc: Sent: Thursday, September 5, 2013 10:29 PM Subject: what's different between numberOfStores and numberOfStorefiles in region status variables hi all: i check the region server status though http://IP:60030/rs-statushttp://ip:60030/rs-status and see for some region ,the two variables is not always same,i wonder that what's different between them? numberOfStores=1, numberOfStorefiles=3