I wanted to run the groupBy(partition ) but this is not working.
here first part in pairvendorData  will be repeated multiple second part.
Both are object do I need to overrite the equals and hash code?
Is groupBy fast enough?

JavaPairRDD<VendorRecord, VendorRecord> pairvendorData
=matchRdd.flatMapToPair( new PairFlatMapFunction<VendorRecord,
VendorRecord, VendorRecord>(){

@Override
public Iterable<Tuple2<VendorRecord,VendorRecord>> call(
VendorRecord t) throws Exception {
List<Tuple2<VendorRecord, VendorRecord>> pairs = new
LinkedList<Tuple2<VendorRecord, VendorRecord>>();
CompanyMatcherHelper helper = new CompanyMatcherHelper();
 MatcherKeys matchkeys=helper.getBlockinkeys(t);
List<VendorRecord> Matchedrecords =ckdao.getMatchingRecordCknids(matchkeys);
log.info("List Size is"+Matchedrecords.size());
for(int i=0;i<Matchedrecords.size();i++){
pairs.add( new Tuple2<VendorRecord,VendorRecord>(t,Matchedrecords.get(i)));
}
 return pairs;
}
 }
);

Reply via email to