[ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596469#action_12596469 ]
Doug Cutting commented on HADOOP-3380: -------------------------------------- Under my proposal above, one would create a compator with: RawComparator c = new SerializationFactory(conf).getSerialization(MyKey.class).getComparator(); So a configuration would be involved, and a serialization framework could in theory support configurable comparators. On the other hand, doing so efficiently might be hard. One could, e.g., implement JavaSerialization#getComparator() to read a configuration parameter that names a list of fields and use introspection to order things by those fields. Ideally it would generate comparator code and compile it on the fly, but that's a lot of work. Record IO provides a single generated comparator that's efficient but not parameterized. Thrift doesn't (yet) even generate comparators! Ideally IDL-generated serializers might generate a general-purpose parameterized comparator, e.g., compare(int[] fieldIds), where {1,-3} might mean to order by increasing values of the first field and decreasing values of the third. For text input (e.g., tab-separated), one could easily write a configurable comparator. We could use the serialization framework to associate a Serialization for String that does that. Would that suffice for now? > need comparators in serializer framework > ---------------------------------------- > > Key: HADOOP-3380 > URL: https://issues.apache.org/jira/browse/HADOOP-3380 > Project: Hadoop Core > Issue Type: New Feature > Components: io > Reporter: Doug Cutting > > The new serialization framework permits Hadoop to incorporate different > serialization systems, including Hadoop's Writable, Thrift, Java > Serialization, etc. It provides a generic, extensible means > (SerializationFactory) to create serializers and deserializers for arbitrary > Java classes. However it does not include a generic means to create > comparators for these classes. Comparators are required for MapReduce keys > and many other computations. Thus we should enhance the serialization > framwork to provide comparators too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.