Hi, I want to write a program to achieve secondary retrieval, but don't know how to do it. I don't know how to express myself, so the source code below my help. I don't know whether my first retieval algorithm is right, but it worked. Database file is the inputfile. I think it is splited into different mappers. I thought that using a LinkedList to store the new keys generated by first retrieval could help. But I don't know how to retrieve the database file from the beginning again. The database file for the first and second retrieval is the same.( args[1] : database path ) Reducer is not used. public class Retrieval { public static void main(String[] args) throws IOException, URISyntaxException { if (args.length != 3) { System.err .println("Usage: Retrieval <protein set path> <database path> <output path>"); System.exit(-1); } JobConf conf = new JobConf(new Configuration(), Retrieval.class); conf.setJobName("Retrieval"); DistributedCache.addCacheFile(new URI(args[0]), conf); FileInputFormat.addInputPath(conf, new Path(args[1])); FileOutputFormat.setOutputPath(conf, new Path(args[2])); conf.setMapperClass(RetrievalMapper.class); //conf.setReducerClass(RetrievalReducer.class); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(Text.class); JobClient.runJob(conf); }
public class RetrievalMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { private Path[] localFiles; public void configure(JobConf conf) { try { this.localFiles = DistributedCache.getLocalCacheFiles(conf); } catch (IOException e) { e.printStackTrace(); } } public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { String line = value.toString(); LinkedList<String> list = new LinkedList<String>(); //store the first neighbors BufferedReader proReader = new BufferedReader(new FileReader(this.localFiles[0].toString())); String proID = new String(""); String[] proteinIDs = line.split("\t"); String tmpString = proteinIDs[0] + "\t" + proteinIDs[1]; while ((proID = proReader.readLine()) != null) { // for each line (protein ID) in key file if(proID.equalsIgnoreCase(proteinIDs[0])){ // hit and proteinIDs[1] is its first neighbor output.collect(new Text(tmpString), new Text(proteinIDs[2])); list.add(proteinIDs[1]); // add first neighbor to list } if(proID.equalsIgnoreCase(proteinIDs[1])){ // hit and proteinIDs[0] is its first neighbor output.collect(new Text(tmpString), new Text(proteinIDs[2])); list.add(proteinIDs[0]); // add first neighbor to list } } proReader.close(); } } -- Regards! Jun Tan