Hi Hitesh, Thanks for your inputs. I would like to follow the approach mentioned in the trailing mail; provided the code/processor implementation is done by non-Tez code. But, how about the code which Tez provides; as I mentioned the org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.ExternalSorter(TezOutputContext, Configuration, int, long) gets its configuration from org.apache.tez.runtime.library.output.OnFileSortedOutput which generates the conf using:
this.conf = TezUtils.createConfFromUserPayload(getContext().getUserPayload()); This conf is finally used to create the comparator: comparator = ConfigUtils.getIntermediateOutputKeyComparator(this.conf); Please let me know how this can be fixed? Do we need to change org.apache.tez.runtime.library.output.OnFileSortedOutput or their exist some workaround ? On Fri, Jun 6, 2014 at 10:58 PM, Hitesh Shah <hit...@apache.org> wrote: > Most of the MR compat layer code in Tez does something like the following: > > byte[] userPayload = context.getUserPayload(); > Configuration conf = TezUtils.createConfFromUserPayload(userPayload); > if (conf instanceof JobConf) { > this.jobConf = (JobConf)conf; > } else { > this.jobConf = new JobConf(conf); > } > > Some of the above should probably be fixed given that the deserialized > payload currently cannot be an instance of JobConf but the above should > give you an idea as to what is being done. If you look into > ReduceProcessor, you will see the comparator being initialized > using ConfigUtils::getInputKeySecondaryGroupingComparator() and it will > always be passed an instance of JobConf. > > Let me know if you are following the above approach or if I am missing > something which should be addressed in Tez. > > thanks > — Hitesh > > On Jun 6, 2014, at 10:37 AM, Subroto Sanyal <sanyalsubr...@gmail.com> > wrote: > > Hi Hitesh, > > I am trying to build and execute a DAG similar to MR but, not exactly > MR(have custom LogicalInput/Output and Processor implementation) which > needs intermediate sorting and shuffling (configured via Edge) > Lets say we have RawComparator class which looks like: > > public class CustomRawComparator implements RawComparator, JobConfigurable > { > > @Override > > public void configure(JobConf conf) { > > // some sort of init process > > _comparator = blah blah blah > > } > > @Override > > public int compare(Object o1, Object o2) { > > return _comparator.compare(o1, o2); > > } > > @Override > > public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int > l2) { > > return _comparator.compare(b1, s1, l1, b2, s2, l2); > > } > > } > > > In my jobclient code I will write something like: > > jobConf.setOutputKeyComparatorClass(CustomRawComparator.class); > > > > On the cluster side (whatever be the framework say MRv1, MRv2 or MR on Tez) > one would expect to get an object fully configured when > > ReflectionUtil.newInstance(class, conf) is invoked. > > The above call is being used in "ExternalSorter" class but, instead of > JobConf a Conf object is being passed.which doesn't allows the "configure" > method of the CustomRawComparator to be invoked. "ExternalSorter" is used > in "OnFileSortedOutput" . TezUtils provides utility to provide > Configuration but, not JobConf. > > I think there will other situation/scenario where this problem exist in Tez > code base. > > > ** I patched the Tez-common so that TezUtils.createConfFromUserPayload > returns a JobConf instead on Configuration which solves the problem(may not > be a good solution). > > > On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hit...@apache.org> wrote: > > Hi Subroto > > Could you provide some more context on what you are trying to do? Are you > trying to run MR-on-Tez? or a native Tez job? > If you could provide us with some code showing what you are trying to do, > we can help further. There are probably some bugs in the MR compatibility > that we may have not come across. > > thanks > — Hitesh > > > On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <sanyalsubr...@gmail.com> > wrote: > > Hi, > > Tez has utility which created Configuration object from the payload: > > TezUtils.createConfFromUserPayload(byte[] payload); this method returns a > Configuration object even though the serialized byte[] can be of type > JobConf. > > > Once we get the Configuration we try to create few object using > ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance > > makes a > > check whether the conf is instance of "org.apache.hadoop.mapred.JobConf" > and accordingly invokes the "configure" method. > > > This behavior is not working anymore in Tez scenario. One simple > > scenario > > when user defines a custom "RawComparator" and makes it "JobConfigurable" > but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter > > doesn't > > care if the configuration could be instance of "org.apache.hadoop.mapred. > JobConf" > Please let me know if there is a problem with Tez or there exist lack of > > my > > understanding about how objects should be created in Tez :-) > > -- > Cheers, > *Subroto Sanyal* > > > > > > -- > Cheers, > *Subroto Sanyal* > -- Cheers, *Subroto Sanyal*