Hi Stack,
When I am trying to add multiple values to the same column, I couldn't see
those if I scan through the tables. I did what you suggested and here is the
code I have written :
public class UploadMoviesInfo extends Configured implements Tool
{
public static class MapClass extends MapReduceBase implements
Mapper<LongWritable, Text, IntWritable, MapWritable>
{
public void map(LongWritable key, Text value,
OutputCollector<IntWritable, MapWritable> output, Reporter reporter) throws
IOException
{
String line = value.toString();
String[] result = line.split("%");
MapWritable mw = new MapWritable();
mw.put(new Text("name:name"), new Text(result[1].toString()));
mw.put(new Text("rating_value:rating_value"), new
Text(result[2].toString()));
mw.put(new Text("country:country"), new
Text(result[3].toString()));
String[] genres = result[4].split(",");
int b = new Integer(result[0]).intValue();
IntWritable iw = new IntWritable(b);
for(int i=0;i<genres.length;i++)
{
mw.put(new Text("genre:genre"), new Text(genres[i]));
output.collect(iw, mw);
}
}
}
public static class ReduceClass extends TableReduce<IntWritable,
MapWritable>
{
@Override
public void reduce(IntWritable key, Iterator<MapWritable> values,
OutputCollector<ImmutableBytesWritable, BatchUpdate> output, Reporter
reporter) throws IOException
{
reporter.setStatus("Reducer committing " + key);
ImmutableBytesWritable ibw = new
ImmutableBytesWritable(Bytes.toBytes(key.get()));
BatchUpdate outval = new BatchUpdate(Bytes.toBytes(key.get()));
while (values.hasNext())
{
MapWritable hmw = new MapWritable(values.next());
outval.put("rating_value:", Bytes.toBytes(hmw.get(new
Text("rating_value:rating_value")).toString()));
outval.put("name:", Bytes.toBytes(hmw.get(new
Text("name:name")).toString()));
outval.put("country:", Bytes.toBytes(hmw.get(new
Text("country:country")).toString()));
outval.put("genre:", Bytes.toBytes(hmw.get(new
Text("genre:genre")).toString()));
output.collect(ibw,outval);
}
}
}
The text file I am parsing looks like this :
1808512447%Never Die Alone%A%United States%Action/Adventure, Thriller,
Crime/Gangster, Adaptation
1807776058%Lilo and Stitch%PG-13%United States%Comedy, Kids/Family, Science
Fiction/Fantasy, Animation
1808467879%Something's Gotta Give%PG-13%United States%Comedy, Romance
1809809725%Aqua Teen Hunger Force Colon Movie Film for Theaters%PG%United
States%Comedy, Animation, Adaptation
1809423256%Lady Chatterley%PG-13%France%Art/Foreign, Drama, Adaptation
1808573131%The Blind Swordsman: Zatoichi%PG-13%Japan%Action/Adventure,
Art/Foreign, Drama
1809374864%Ossessione%PG-13%Italy%Drama
1808746739%Love%Unrated%United States%Thriller
So according to this my genre column should have 4 genres (comma separated)
for the first movie but I only find when I scan through the table.
Please let me know if I am doing something wrong. Also about my query below,
the IntWritables gets changed to those characters and then I am unable to
use the Hbase shell to query data. Is there a workaround ?
Thanks
On Fri, Nov 28, 2008 at 3:50 PM, Nishant Khurana <[EMAIL PROTECTED]>wrote:
> Thanks,
> It worked :) . One more question. When I store Integer values as row keys
> or any column values and run scan table from hbase shell, they come like
> this :
> \000\000C| column=year:, timestamp=1227905036961,
> value=1999
> \000\000C~ column=name:, timestamp=1227905036962,
> value=The 39 Steps
> \000\000C~ column=yahoo_movie_id:,
> timestamp=1227905036962, value=k{I\357\277\275
> \000\000C~ column=year:, timestamp=1227905036962,
> value=1935
> \000\000C\200 column=name:, timestamp=1227905036962,
> value=Prophecy
> \000\000C\200 column=yahoo_movie_id:,
> timestamp=1227905036962, value=k\357\277\275\n@
> \000\000C\200 column=year:, timestamp=1227905036962,
> value=1979
>
> Notice the first column and value part both of which were integers. Is it
> because they get converted to ImmutableBytesWritable that they look like
> this ? Can I store them in readable form ?
> Thanks
>
>
>
>
> On Fri, Nov 28, 2008 at 3:08 PM, stack <[EMAIL PROTECTED]> wrote:
>
>> How is job being setup? I'd suspect you are calling initTableReduceJob in
>> job setup. Look at what it does. It sets the reduce key type. Maybe
>> after
>> calling it, reset the reduce key type to IntWritable.
>> St.Ack
>>
>>
>>
>> On Fri, Nov 28, 2008 at 11:48 AM, Nishant Khurana <[EMAIL PROTECTED]
>> >wrote:
>>
>> > Hi,
>> > I am trying to run a map reduce job which parses a text file and fills
>> up a
>> > Hbase Table. Following is the code :
>> >
>> >
>> > public class UploadMoviesList extends Configured implements Tool
>> > {
>> > public static class MapClass extends MapReduceBase implements
>> > Mapper<LongWritable, Text, IntWritable, MapWritable>
>> > {
>> > public void map(LongWritable key, Text value,
>> > OutputCollector<IntWritable, MapWritable> output, Reporter reporter)
>> throws
>> > IOException
>> > {
>> > String line = value.toString();
>> > String[] result = line.split("%");
>> > MapWritable mw = new MapWritable();
>> > mw.put(new Text("year:year"), new Text(result[1].toString()));
>> > mw.put(new Text("name:name"), new Text(result[2].toString()));
>> > int a = new Integer(result[3]).intValue();
>> > mw.put(new Text("y_movie_id:y_movie_id"), new IntWritable(a));
>> > int b = new Integer(result[0]).intValue();
>> > output.collect(new IntWritable(b), mw);
>> > }
>> > }
>> >
>> > public static class ReduceClass extends TableReduce<IntWritable,
>> > MapWritable>
>> > {
>> > @Override
>> > public void reduce(IntWritable key, Iterator<MapWritable> values,
>> > OutputCollector<ImmutableBytesWritable, BatchUpdate> output, Reporter
>> > reporter) throws IOException
>> > {
>> > reporter.setStatus("Reducer committing " + key);
>> > ImmutableBytesWritable ibw = new
>> > ImmutableBytesWritable(Bytes.toBytes(key.get()));
>> > BatchUpdate outval = new BatchUpdate(Bytes.toBytes(key.get()));
>> > while (values.hasNext())
>> > {
>> > MapWritable hmw = new MapWritable(values.next());
>> > outval.put("year:year",
>> > Bytes.toBytes(hmw.get("year:year").toString()));
>> > outval.put("name:name",
>> > Bytes.toBytes(hmw.get("name:name").toString()));
>> > IntWritable iw =
>> (IntWritable)(hmw.get("y_movie_id:y_movie_id"));
>> > outval.put("y_movie_id:y_movie_id", Bytes.toBytes(iw.get()));
>> > output.collect(ibw,outval);
>> > }
>> > }
>> > }
>> >
>> >
>> > When I try to run it, I am getting following exceptions :
>> > 08/11/28 14:42:27 INFO mapred.JobClient: Task Id :
>> > attempt_200811281158_0005_m_000001_0, Status : FAILED
>> > java.io.IOException: Type mismatch in key from map: expected
>> > org.apache.hadoop.hbase.io.ImmutableBytesWritable, recieved
>> > org.apache.hadoop.io.IntWritable
>> > at
>> >
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
>> > at
>> dist_q_data.UploadMoviesList$MapClass.map(UploadMoviesList.java:45)
>> > at dist_q_data.UploadMoviesList$MapClass.map(UploadMoviesList.java:1)
>> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>> > at
>> > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>> >
>> >
>> > I don't know why it says it expects a ImmutableBytesWritable key. Any
>> > suggestions ?
>> > Thanks
>> >
>> > --
>> > Nishant Khurana
>> > Candidate for Masters in Engineering (Dec 2009)
>> > Computer and Information Science
>> > School of Engineering and Applied Science
>> > University of Pennsylvania
>> >
>>
>
>
>
> --
> Nishant Khurana
> Candidate for Masters in Engineering (Dec 2009)
> Computer and Information Science
> School of Engineering and Applied Science
> University of Pennsylvania
>
--
Nishant Khurana
Candidate for Masters in Engineering (Dec 2009)
Computer and Information Science
School of Engineering and Applied Science
University of Pennsylvania