Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The "IOSystem" page has been changed by thomasjungblut: http://wiki.apache.org/hama/IOSystem?action=diff&rev1=1&rev2=2 + <<TableOfContents(5)>> + + == General Information == + + Since Hama 0.4.0 we provide a input and output system for BSP Jobs. + + TODO: Some blahblah about key value and stuff + What's in case when no input is configured? and stuff like that should be documented here.. + + + == Input == + + === Configuring Input === + + When setting up a BSPJob, you can provide a InputFormat and a Path where to find the input. + + {{{ + BSPJob job = new BSPJob(); + // detail stuff omitted + job.setInputPath(new Path("/tmp/test.seq"); + job.setInputFormat(org.apache.hama.bsp.SequenceFileInputFormat.class); + }}} + + Another way to add input paths is following: + {{{ + SequenceFileInputFormat.addInputPath(job, new Path("/tmp/test.seq")); + }}} + + You can also add multiple paths by using this method: + + {{{ + SequenceFileInputFormat.addInputPaths(job, "/tmp/test.seq,/tmp/test2.seq,/tmp/test3.seq"); + }}} + + '''Note that these paths must be separated by a comma.''' + + In case of a {{{SequenceFileInputFormat}}} the key and value pair are parsed from the header. + + When you use want to read a basic textfile with {{{TextInputFormat}}} the key is always {{{LongWritable}}} which contains how much bytes have been read and {{{Text}}} which contains a line of your input. + + + === Using Input === + + You can now read the input from each of the functions in {{{BSP}}} class which has {{{BSPPeer}}} as parameter. (e.G. setup / bsp / cleanup) + + In this case we read a normal text file: + {{{ + @Override + public final void bsp( + BSPPeer<LongWritable, Text, KEYOUT, VALUEOUT> peer) + throws IOException, InterruptedException, SyncException { + + // this method reads the next key value record from file + KeyValuePair<LongWritable, Text> pair = peer.readNext(); + + // the following lines do the same: + LongWritable key = new LongWritable(); + Text value = new Text(); + peer.readNext(key, value); + } + }}} + + Consult the docs for more detail on events like end of file. + + There is also a function which allows you to re-read the input from the beginning. + + This snippet reads the input five times: + + {{{ + for(int i = 0; i < 5; i++){ + LongWritable key = new LongWritable(); + Text value = new Text(); + while (peer.readNext(key, value)) { + // read everything + } + // reopens the input + peer.reopenInput() + } + }}} + + === Custom Inputformat === + + You can implement your own inputformat blabla + + == Output == + + === Configuring Output === + + === Using Input === + + === Custom Outputformat === + + == Implementation notes == + + === Internal implementation details === + BSPJobClient 1. Create the splits for the job @@ -12, +108 @@ 1. Receives splitFile 2. Add split argument to TaskInProgress constructor + Task + + 1. Gets his split from Groom + 2. Initializes everything in BSPPeerImpl +
