[ 
https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596469#action_12596469
 ] 

Doug Cutting commented on HADOOP-3380:
--------------------------------------

Under my proposal above, one would create a compator with:

RawComparator c = new 
SerializationFactory(conf).getSerialization(MyKey.class).getComparator();

So a configuration would be involved, and a serialization framework could in 
theory support configurable comparators.  On the other hand, doing so 
efficiently might be hard.  One could, e.g., implement 
JavaSerialization#getComparator() to read a configuration parameter that names 
a list of fields and use introspection to order things by those fields.  
Ideally it would generate comparator code and compile it on the fly, but that's 
a lot of work.  Record IO provides a single generated comparator that's 
efficient but not parameterized.  Thrift doesn't (yet) even generate 
comparators!  Ideally IDL-generated serializers might generate a 
general-purpose parameterized comparator, e.g., compare(int[] fieldIds), where 
{1,-3} might mean to order by increasing values of the first field and 
decreasing values of the third.

For text input (e.g., tab-separated), one could easily write a configurable 
comparator.  We could use the serialization framework to associate a 
Serialization for String that does that.  Would that suffice for now?

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different 
> serialization systems, including Hadoop's Writable, Thrift, Java 
> Serialization, etc.  It provides a generic, extensible means 
> (SerializationFactory) to create serializers and deserializers for arbitrary 
> Java classes.  However it does not include a generic means to create 
> comparators for these classes.  Comparators are required for MapReduce keys 
> and many other computations.  Thus we should enhance the serialization 
> framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to