Here is my SS: 259 71 2451 On May 21, 2012, at 10:31 AM, murat migdisoglu <murat.migdiso...@gmail.com> wrote:
> Hi, > > I'm quite new in Hadoop and trying to understand how the task split works > when used with Cassandra ColumnFamilyInputFormat. > > I have a very basic scenario: Cassandra has the sessionId and a bson data > that contains the username. I want to go through all rows and dump the row > to a file when the username is matching to a certain criteria. And I do not > need any Reducer or Combiner for now. > > After I've written the following very simple hadoop job, I see from the > logs that my mapper function is called per each row. Is that normal? If > that is the case, doing such a search operation in a big dataset would take > hours if not days... > > I guess i need a better understanding on how splitting the job into tasks > works exactly.. > > > @Override > public void map(ByteBuffer key, SortedMap<ByteBuffer, IColumn> columns, > Context context) > throws IOException, InterruptedException > { > String rowkey = ByteBufferUtil.string(key); > String ip = context.getConfiguration().get(IP); > IColumn column = columns.get(sourceColumn); > if (column == null) > return; > ByteBuffer byteBuffer = column.value(); > ByteBuffer bb2 = byteBuffer.duplicate(); > > DataConvertor convertor= fromBson(byteBuffer, > DataConvertor.class); > String username= convertor.getUsername(); > BytesWritable value = new BytesWritable(); > if (username != null && username.equals(cip)) { > byte[] arr = convertToByteArray(bb2); > value.set(new BytesWritable(arr)); > Text tkey = new Text(rowkey); > context.write( tkey, value); > } else { > log.info("ip not match [" + ip + "]"); > } > } > > Thanks in advance > Kind Regards