Hi Hitesh, I am trying to build and execute a DAG similar to MR but, not exactly MR(have custom LogicalInput/Output and Processor implementation) which needs intermediate sorting and shuffling (configured via Edge) Lets say we have RawComparator class which looks like:
public class CustomRawComparator implements RawComparator, JobConfigurable { @Override public void configure(JobConf conf) { // some sort of init process _comparator = blah blah blah } @Override public int compare(Object o1, Object o2) { return _comparator.compare(o1, o2); } @Override public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { return _comparator.compare(b1, s1, l1, b2, s2, l2); } } In my jobclient code I will write something like: jobConf.setOutputKeyComparatorClass(CustomRawComparator.class); On the cluster side (whatever be the framework say MRv1, MRv2 or MR on Tez) one would expect to get an object fully configured when ReflectionUtil.newInstance(class, conf) is invoked. The above call is being used in "ExternalSorter" class but, instead of JobConf a Conf object is being passed.which doesn't allows the "configure" method of the CustomRawComparator to be invoked. "ExternalSorter" is used in "OnFileSortedOutput" . TezUtils provides utility to provide Configuration but, not JobConf. I think there will other situation/scenario where this problem exist in Tez code base. ** I patched the Tez-common so that TezUtils.createConfFromUserPayload returns a JobConf instead on Configuration which solves the problem(may not be a good solution). On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hit...@apache.org> wrote: > Hi Subroto > > Could you provide some more context on what you are trying to do? Are you > trying to run MR-on-Tez? or a native Tez job? > If you could provide us with some code showing what you are trying to do, > we can help further. There are probably some bugs in the MR compatibility > that we may have not come across. > > thanks > — Hitesh > > > On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <sanyalsubr...@gmail.com> > wrote: > > > Hi, > > > > Tez has utility which created Configuration object from the payload: > > > > TezUtils.createConfFromUserPayload(byte[] payload); this method returns a > > Configuration object even though the serialized byte[] can be of type > > JobConf. > > > > > > Once we get the Configuration we try to create few object using > > ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance > makes a > > check whether the conf is instance of "org.apache.hadoop.mapred.JobConf" > > and accordingly invokes the "configure" method. > > > > > > This behavior is not working anymore in Tez scenario. One simple > scenario > > when user defines a custom "RawComparator" and makes it "JobConfigurable" > > but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter > doesn't > > care if the configuration could be instance of "org.apache.hadoop.mapred. > > JobConf" > > Please let me know if there is a problem with Tez or there exist lack of > my > > understanding about how objects should be created in Tez :-) > > > > -- > > Cheers, > > *Subroto Sanyal* > > > -- Cheers, *Subroto Sanyal*