I believe the recommended approach would be to use CellUtil. It is marked as Public/Evolving and exposes a number of static createCell() methods that will generate Cell instances for different combinations of parameters.
On Tue, Apr 14, 2015 at 9:04 AM Nick Dimiduk <[email protected]> wrote: > Heya devs, > > Looking for an appropriate answer for this question, it seems we don't have > a public Cell implementation that can be used for generating HFiles. How > are folks expected to generate Cells instances if KeyValue is > @InterfaceAudience.Private ? > > -n > > On Tue, Apr 14, 2015 at 9:00 AM, Nick Dimiduk <[email protected]> wrote: > > > What version of HBase are you using? > > > > In recent versions, HFileOutputFormat is a deprecated class (replaced by > > HFileOutputFormat2), and KeyValue is an internal API (see the class > > annotation @InterfaceAudience.Private, basically, use at your own peril). > > The javadoc on the KeyValue constructor you're using says "Creates a > > KeyValue from the start of the specified byte array. *Presumes bytes > > content is formatted as a KeyValue blob*." (emphasis my own) It looks > > like the value you're using for bytes is not in the KeyValue blob format. > > > > Try instead one of the other KeyValue constructors, such as > > KeyValue(byte[] row, byte[] family, byte[] qualifier, byte[] value). > > > > Thanks, > > Nick > > > > > > On Thu, Apr 9, 2015 at 12:23 PM, yuantao peng <[email protected]> > > wrote: > > > >> I am learning how to upload binary data to HBASE using mapreduce. Here > >> are the steps I am following assuming my binary file is testlist > >> (1) wrote a sequencefilewrite.java to read the local testlist file and > >> save > >> a sequence file to HDFS > >> (2) wrote a MapReduce program to read the generated sequence file, and > >> generate a HFile > >> (3) bulk import this HFile to HBASE > >> > >> I am stuck at step (2) as I keep getting exception. I am absolutely new > >> to > >> hadoop/hbase, code is posted below, any comments or suggestions are > >> appreciated!!! > >> > >> Sequencewrite.java is like this: > >> > >> public class SequenceFileWrite > >> { > >> public static void main(String[] args) throws IOException { > >> String uri = args[1]; > >> Configuration conf = new Configuration(); > >> FileSystem fs = FileSystem.get(conf); > >> Path path = new Path(uri); > >> File infile = new File(args[0]); > >> SequenceFile.Writer writer = null; > >> try > >> { > >> > >> BytesWritable key, value; > >> writer = SequenceFile.createWriter(fs, conf, path, > >> BytesWritable.class, BytesWritable.class); > >> FileInputStream fin = new FileInputStream(infile); > >> for(int i=0; i<10; ++i) { > >> key = new BytesWritable(); > >> value = new BytesWritable(); > >> byte[] keybuf = new byte[2]; > >> byte[] valbuf = new byte[2]; > >> fin.read(keybuf); > >> fin.read(valbuf); > >> key.set(keybuf,0,2); > >> value.set(valbuf,0,2); > >> writer.append(key,value); > >> } > >> } finally { > >> IOUtils.closeStream(writer); > >> } > >> } > >> } > >> > >> And my mapper is like this: > >> > >> public class HBaseTkrHdrMapper extends Mapper<BytesWritable, > >> BytesWritable, > >> ImmutableBytesWritable, KeyValue> { > >> > >> int tipOffSeconds = 0; > >> String tableName = ""; > >> > >> ImmutableBytesWritable hKey = new ImmutableBytesWritable(); > >> KeyValue kv; > >> > >> @Override > >> protected void setup(Context context) throws IOException, > >> InterruptedException { > >> Configuration c = context.getConfiguration(); > >> tipOffSeconds = c.getInt("epoch.seconds.tipoff", 0); > >> tableName = c.get("hbase.table.mrtest"); > >> } > >> > >> @Override > >> protected void map(BytesWritable key, BytesWritable value, Context > >> context) throws IOException, InterruptedException { > >> ImmutableBytesWritable hkey = new > >> ImmutableBytesWritable(key.getBytes()); > >> KeyValue hval = new KeyValue(value.getBytes()); > >> context.write(hkey, hval); > >> } > >> } > >> > >> Driver code is as follows: > >> > >> public class Driver { > >> public static void main(String[] args) throws Exception { > >> Configuration conf = new Configuration(); > >> args = new GenericOptionsParser(conf, args).getRemainingArgs(); > >> > >> @SuppressWarnings("deprecation") > >> Job job = new Job(conf, "Bulk Import"); > >> job.setJarByClass(HBaseTkrHdrMapper.class); > >> > >> job.setMapperClass(HBaseTkrHdrMapper.class); > >> job.setMapOutputKeyClass(ImmutableBytesWritable.class); > >> job.setMapOutputValueClass(KeyValue.class); > >> job.setInputFormatClass(SequenceFileInputFormat.class); > >> > >> HTable hTable = new HTable(conf, args[2]); > >> > >> // Auto configure partitioner and reducer > >> HFileOutputFormat.configureIncrementalLoad(job, hTable); > >> > >> FileInputFormat.addInputPath(job, new Path(args[0])); > >> FileOutputFormat.setOutputPath(job, new Path(args[1])); > >> > >> job.waitForCompletion(true); > >> } > >> } > >> > >> > >> The exception I got is : > >> > >> > >> Error: java.lang.IllegalArgumentException: offset (0) + length (4) > exceed > >> the capacity of the array: 3 > >> at > >> > >> > org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:602) > >> at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:751) > >> at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:737) > >> at org.apache.hadoop.hbase.KeyValue.getLength(KeyValue.java:972) > >> at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:276) > >> at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:265) > >> at > >> > >> > com.bloomberg.tickerplant.hbase.HBaseTkrHdrMapper.map(HBaseTkrHdrMapper.java:41) > >> at > >> > >> > com.bloomberg.tickerplant.hbase.HBaseTkrHdrMapper.map(HBaseTkrHdrMapper.java:23) > >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > >> at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) > >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at javax.security.auth.Subject.doAs(Subject.java:415) > >> at > >> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) > >> > >> > >> Exception in thread "main" java.io.IOException: > >> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > >> java.lang.NullPointerException > >> at > >> > >> > org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getTaskAttemptCompletionEvents(HistoryClientService.java:269) > >> at > >> > >> > org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBServiceImpl.java:173) > >> at > >> > >> > org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:283) > >> at > >> > >> > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2053) > >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at javax.security.auth.Subject.doAs(Subject.java:415) > >> at > >> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047) > >> > > > > >
