Is the footer on this email a little rough for content that will be passed around and made indexable on the internets?
Just saying :) Cheers James Sent from my mobile. Please excuse the typos. On 2010-09-10, at 8:01 PM, "Kaluskar, Sanjay" <[email protected]> wrote: > Have you considered using something higher-level like PIG or Hive? Are > there reasons why you need to process at this low level? > > -----Original Message----- > From: Aaron Baff [mailto:[email protected]] > Sent: Friday, September 10, 2010 11:50 PM > To: [email protected] > Subject: Custom Key class not working correctly > > So I'm pretty new to Hadoop, just learning it for work, and starting to > play with some of our data on a VM cluster to see it work, and to make > sure it can do what we need to. By and large, very cool, I think I'm > getting the hang of it, but when I try and make a custom composite key > class, it doesn't seem to correctly group the data correctly. > > The data is a bunch of phone numbers with various transactional data > (timestamp, phone type, other call data). My Mapper is pretty much just > taking the data, and splitting it out into a custom Key (or Text with > just the phone number) and custom Value to hold the rest of the data. > > In my reducer, I'm counting the number of unique phone numbers among > other things using a Reporter counter. Using my key class (code below), > I get a total of 56,404 unique numbers which is way too low. When I use > just the phone number (using Text) as the key, it gives me 1,159,558 > which is correct. In my custom class hashCode() method I'm just using > the String.hashCode() for the String holding the phone number. > > That seemed reasonable to me, since I wanted it to group the values by > the phone number, and then order by the timestamp which is what I'm > doing in the compareTo() function. > > > ======================================================================== > ==================== > > import java.io.DataInput; > import java.io.DataOutput; > import java.io.IOException; > import org.apache.hadoop.io.WritableComparable; > import org.apache.hadoop.io.WritableComparator; > > public class AIMdnTimeKey implements WritableComparable { > String mdn = ""; > long timestamp = -1L; > private byte oli = 0; > > public AIMdnTimeKey() { > } > > public AIMdnTimeKey( String initMdn, long initTimestamp) { > mdn = initMdn; > timestamp = initTimestamp; > } > > public void setMdn( String newMdn ) { > mdn = newMdn; > } > > public String getMdn() { > return mdn; > } > > public void setTimestamp( long newTimestamp ) { > timestamp = newTimestamp; > } > > public long getTimestamp() { > return timestamp; > } > > public void write(DataOutput out) throws IOException { > out.writeUTF(mdn); > out.writeByte(oli); > out.writeLong(timestamp); > } > > public void readFields(DataInput in) throws IOException { > mdn = in.readUTF(); > oli = in.readByte(); > timestamp = in.readLong(); > } > > public int compareTo(Object obj) throws ClassCastException { > if (obj == null) { > throw new ClassCastException("Object is NULL and so cannot > be compared!"); > } > if (getClass() != obj.getClass()) { > throw new ClassCastException("Object is of type " + > obj.getClass().getName() + " which cannot be compared to this class of > type " + getClass().getName()); > } > final AIMdnTimeKey other = (AIMdnTimeKey) obj; > > return (int)(this.timestamp - other.timestamp); > } > > @Override > public int hashCode() { > > return mdn.hashCode(); > } > > @Override > public boolean equals(Object obj) { > if (obj == null) { > return false; > } > if (getClass() != obj.getClass()) { > return false; > } > final AIMdnTimeKey other = (AIMdnTimeKey) obj; > if ((this.mdn == null) ? (other.mdn != null) : > !this.mdn.equals(other.mdn)) { > return false; > } > return true; > } > > @Override > public String toString() { > return mdn + " " + timestamp; > } > > /** > * @return the oli > */ > public byte getOli() { > return oli; > } > > /** > * @param oli the oli to set > */ > public void setOli(byte oli) { > this.oli = oli; > } > } > > ======================================================================== > ==================== > > > > Aaron Baff | Developer | Telescope, Inc. > > email: [email protected]<mailto:[email protected]> | > office: 424 270 2913 | www.telescope.tv<http://www.telescope.tv/> > > The information contained in this email is confidential and may be > legally privileged. It is intended solely for the addressee. Access to > this email by anyone else is unauthorized. If you are not the intended > recipient, any disclosure, copying, distribution or any action taken or > omitted to be taken in reliance on it, is prohibited and may be > unlawful. Any views expressed in this message are those of the > individual and may not necessarily reflect the views of Telescope Inc. > or its associated companies. >
