Need some more help. I wrote sequence file using below code but now when I run mapreduce job I get "file.*java.lang.ClassCastException*: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text" even though I didn't use LongWritable when I originally wrote to the sequence
//Code to write to the sequence file. There is no LongWritable here org.apache.hadoop.io.Text key = *new* org.apache.hadoop.io.Text(); BufferedReader buffer = *new* BufferedReader(*new* FileReader(filePath)); String line = *null*; org.apache.hadoop.io.Text value = *new* org.apache.hadoop.io.Text(); *try* { writer = SequenceFile.*createWriter*(fs, conf, path, key.getClass(), value.getClass(), SequenceFile.CompressionType.*RECORD*); *int* i = 1; *long* timestamp=System.*currentTimeMillis*(); *while* ((line = buffer.readLine()) != *null*) { key.set(String.*valueOf*(timestamp)); value.set(line); writer.append(key, value); i++; } On Tue, Feb 21, 2012 at 12:18 PM, Arko Provo Mukherjee < arkoprovomukher...@gmail.com> wrote: > Hi, > > I think the following link will help: > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html > > Cheers > Arko > > On Tue, Feb 21, 2012 at 2:04 PM, Mohit Anchlia <mohitanch...@gmail.com > >wrote: > > > Sorry may be it's something obvious but I was wondering when map or > reduce > > gets called what would be the class used for key and value? If I used > > "org.apache.hadoop.io.Text > > value = *new* org.apache.hadoop.io.Text();" would the map be called with > > Text class? > > > > public void map(LongWritable key, Text value, Context context) throws > > IOException, InterruptedException { > > > > > > On Tue, Feb 21, 2012 at 11:59 AM, Arko Provo Mukherjee < > > arkoprovomukher...@gmail.com> wrote: > > > > > Hi Mohit, > > > > > > I am not sure that I understand your question. > > > > > > But you can write into a file using: > > > *BufferedWriter output = new BufferedWriter > > > (new OutputStreamWriter(fs.create(my_path,true)));* > > > *output.write(data);* > > > * > > > * > > > Then you can pass that file as the input to your MapReduce program. > > > > > > *FileInputFormat.addInputPath(jobconf, new Path (my_path) );* > > > > > > From inside your Map/Reduce methods, I think you should NOT be > tinkering > > > with the input / output paths of that Map/Reduce job. > > > Cheers > > > Arko > > > > > > > > > On Tue, Feb 21, 2012 at 1:38 PM, Mohit Anchlia <mohitanch...@gmail.com > > > >wrote: > > > > > > > Thanks How does mapreduce work on sequence file? Is there an example > I > > > can > > > > look at? > > > > > > > > On Tue, Feb 21, 2012 at 11:34 AM, Arko Provo Mukherjee < > > > > arkoprovomukher...@gmail.com> wrote: > > > > > > > > > Hi, > > > > > > > > > > Let's say all the smaller files are in the same directory. > > > > > > > > > > Then u can do: > > > > > > > > > > *BufferedWriter output = new BufferedWriter > > > > > (newOutputStreamWriter(fs.create(output_path, > > > > > true))); // Output path* > > > > > > > > > > *FileStatus[] output_files = fs.listStatus(new Path(input_path)); > // > > > > Input > > > > > directory* > > > > > > > > > > *for ( int i=0; i < output_files.length; i++ ) * > > > > > > > > > > *{* > > > > > > > > > > * BufferedReader reader = new > > > > > > > > > BufferedReader(newInputStreamReader(fs.open(output_files[i].getPath()))); > > > > > * > > > > > > > > > > * String data;* > > > > > > > > > > * data = reader.readLine();* > > > > > > > > > > * while ( data != null ) * > > > > > > > > > > * {* > > > > > > > > > > * output.write(data);* > > > > > > > > > > * }* > > > > > > > > > > * reader.close* > > > > > > > > > > *}* > > > > > > > > > > *output.close* > > > > > > > > > > > > > > > In case you have the files in multiple directories, call the code > for > > > > each > > > > > of them with different input paths. > > > > > > > > > > Hope this helps! > > > > > > > > > > Cheers > > > > > > > > > > Arko > > > > > > > > > > On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia < > > mohitanch...@gmail.com > > > > > >wrote: > > > > > > > > > > > I am trying to look for examples that demonstrates using sequence > > > files > > > > > > including writing to it and then running mapred on it, but unable > > to > > > > find > > > > > > one. Could you please point me to some examples of sequence > files? > > > > > > > > > > > > On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks < > bejoy.had...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Hi Mohit > > > > > > > AFAIK XMLLoader in pig won't be suited for Sequence Files. > > > > Please > > > > > > > post the same to Pig user group for some workaround over the > > same. > > > > > > > SequenceFIle is a preferred option when we want to > store > > > > small > > > > > > > files in hdfs and needs to be processed by MapReduce as it > stores > > > > data > > > > > in > > > > > > > key value format.Since SequenceFileInputFormat is available at > > your > > > > > > > disposal you don't need any custom input formats for processing > > the > > > > > same > > > > > > > using map reduce. It is a cleaner and better approach compared > to > > > > just > > > > > > > appending small xml file contents into a big file. > > > > > > > > > > > > > > On Tue, Feb 21, 2012 at 11:00 PM, Mohit Anchlia < > > > > > mohitanch...@gmail.com > > > > > > > >wrote: > > > > > > > > > > > > > > > On Tue, Feb 21, 2012 at 9:25 AM, Bejoy Ks < > > > bejoy.had...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > Mohit > > > > > > > > > Rather than just appending the content into a normal > > text > > > > > file > > > > > > or > > > > > > > > > so, you can create a sequence file with the individual > > smaller > > > > file > > > > > > > > content > > > > > > > > > as values. > > > > > > > > > > > > > > > > > > Thanks. I was planning to use pig's > > > > > > > > org.apache.pig.piggybank.storage.XMLLoader > > > > > > > > for processing. Would it work with sequence file? > > > > > > > > > > > > > > > > This text file that I was referring to would be in hdfs > itself. > > > Is > > > > it > > > > > > > still > > > > > > > > different than using sequence file? > > > > > > > > > > > > > > > > > Regards > > > > > > > > > Bejoy.K.S > > > > > > > > > > > > > > > > > > On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia < > > > > > > > mohitanch...@gmail.com > > > > > > > > > >wrote: > > > > > > > > > > > > > > > > > > > We have small xml files. Currently I am planning to > append > > > > these > > > > > > > small > > > > > > > > > > files to one file in hdfs so that I can take advantage of > > > > splits, > > > > > > > > larger > > > > > > > > > > blocks and sequential IO. What I am unsure is if it's ok > to > > > > > append > > > > > > > one > > > > > > > > > file > > > > > > > > > > at a time to this hdfs file > > > > > > > > > > > > > > > > > > > > Could someone suggest if this is ok? Would like to know > how > > > > other > > > > > > do > > > > > > > > it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >