I agree that it makes sense.  It also seems that for the goal of seamlessly
moving Hadoop interfaces to use Avro, that doesn't _require_ that the
parameter names be correct, only that they be consistent.  So, if we don't
have paranamer or Java-annotations, we can simply fall back to
auto-generated names like string1, string2, int1, like in Avro-164.  That
way we can get a simple build (I really hate complicated builds, and it'll
probably encourage avro adoption if we don't need build-system changes)

If in future someone changes the method signature of hadoop interface in a
way that would break serialization compatibility, then that might be the
time to require annotations (my preference) or paranamer, or that they move
to a .avpr!

It would suck a little that this would effectively bake in names like
'string2' into the contract for Hadoop interfaces, but if we can make the
barrier low enough, maybe someone would do a little work on the Hadoop
interfaces.  For me, a java annotation or manually adding the paranamer
metadata meets that threshold, adding a step to the build probably doesn't.


On the performance issue, I don't think reflection is actually that bad on
Java 6 if you cache the Method/Field objects.  But I think we should allow
external helper objects for serialization.  The helper would implement the
get/set field contract, but would act on a 'target object'.  This would
allow us to use existing classes (which could have extra methods/logic).  It
would also let us use the existing Hadoop objects unchanged.  We would have
an implementation that works using runtime Reflection, and we could also
code-generate these just like we do in the Specific case.  If people think
this is a good idea I'll open a JIRA ticket for it (and maybe even work on
it at the hackat...@digg!)

Justin





On Thu, Nov 12, 2009 at 10:43 PM, Philip Zeyliger <[email protected]>wrote:

> Makes sense to me.  I think it may be useful to check in the .avpr files
> that are induced on the way, to let folks start trying to use different
> clients for certain operations.
>
> -- Philip
>
> On Wed, Nov 11, 2009 at 11:21 AM, Doug Cutting <[email protected]> wrote:
>
> > Justin Santa Barbara wrote:
> >
> >> What about Philip's point on existing Hadoop interfaces?  Any plans for
> >> how
> >> we'll generate the Protocol object for these?
> >>
> >
> > I'm hoping to use reflection initially for this.  That's the motivation
> for
> > my renewed interest in AVRO-80.
> >
> > https://issues.apache.org/jira/browse/AVRO-80
> >
> > My rationale is that I don't want to assume we'll move Hadoop onto Avro
> > overnight.  So I'd like to move things in a way that's easy to maintain
> in a
> > branch/patch.  If we can get reflection to work, then we only need to
> update
> > two places per Hadoop protocol: where it calls RPC.getProxy() and
> > RPC.getServer().  Then we can start looking at performance.  We cannot
> > commit Avro-based Hadoop RPC until performance is adequate, and we don't
> > want to have to maintain a patch that changes many central data
> structures
> > in Hadoop while we're testing and improving performance, since that might
> > take time.
> >
> > Once we've committed Hadoop to using Avro, then we can consider,
> > protocol-by-protocol, replacing Hadoop's Writable objects with Avro
> > generated objects.  Until then, the protocol will be defined implicitly
> by
> > Java through reflection.
> >
> > Note that if performance is inadequate due to reflection itself, rather
> > than the client/server implementations, we might resort to byte-code
> > modification to accelerate it.
> >
> > https://issues.apache.org/jira/browse/AVRO-143
> >
> > This would also be a temporary approach.  Longer-term we should move
> Hadoop
> > to use protocols declared in .avpr files and generated classes. But I
> don't
> > think that's practical in the short-term.
> >
> > My current short-term goal is to try to get Avro's reflection to the
> point
> > where it can implement NamenodeProtocol.
> >
> > Does this make sense?
> >
> > Doug
> >
>

Reply via email to