Hi all, In avro there is a limitation to allow only strings as map keys: http://avro.apache.org/docs/current/spec.html#Maps
I have experienced some suffering with it, also I have found several emails in mailing list about this, some tickets (e.g. AVRO-1147), one of those is feature proposal: [AVRO-680](https://issues.apache.org/jira/browse/AVRO-680). In my use-case there are thrift objects that should be converted to avro and these thrift-objects use different types in map keys (which is by the way fine for Thrift, C++, C#, java, js, perl, php, pythons & ruby). So, in case with automatic thrift->avro conversion the converter just throws exception on conversion: `main/java/org/apache/avro/thrift/ThriftData.java:222`. So assuming thrift-objects cannot be changed, building some work around seems to be really wrong and ugly, at least before it is not cleared what are the reasons of those restrictions... I am really curious to find out why it was done so... (and also make it better). So, I have looked this up and found [AVRO-9](https://issues.apache.org/jira/browse/AVRO-9). I have interpreted the reasons to have this restriction as: 1. Easiness of integration with the standard map datastructure of "many scripting languages". 2. Implementation simplification as dynamic records, where key name is mapped to field name from instance to instance. I have found also unanswered email about reason 1: http://search-hadoop.com/m/J08Te2HvNbT1 So, I am really concerned about "many scripting languages", especially, if reduce all of them to subset of those that avro is supporting after some years of project life (and plan to support in future). I have checked next languages using repl.it, http://codepad.org/ and http://hyperpolyglot.org/scripting, and found that it is possible to use at least int and float there as a map key: * ruby * php * pythons * js * perl So, it doesn't look like an argument anymore, while the absence of this feature still makes me and some other people suffer, according to emails and Jira-tickets. Also, it looks, that there was similar limitation in Cassandra and they [got rid of it](https://issues.apache.org/jira/browse/CASSANDRA-767) I have worked some time with thrift and I have not experienced any problems with integers/shorts in map keys (except from thrift->avro conversion). And the benefit of saving some bytes pro record is considerable, because it is linearly scaled with number of records. Also, in protobuf, afaik, there are no dictionaries at all - lists of pairs are used instead, and it is possible to use any type as key. (http://stackoverflow.com/questions/4194845/dictionary-in-protocol-buffers?rq=1). This is also one of the workarounds for this restriction in avro, but doesn't solve the case with thrift->avro conversion. So, in regards to reason 1 I have serious doubts. I am really interested in Doug Cutting's and community opinion. In regards to reason 2 - my concerns are that maybe there are some algorithmic limitations to have the restriction, or other parts of the system that heavily rely on this (MapReduce, Pig, etc). But my brief research on that did not lead to any reasoning, why keys type should be restricted to String. I also admit, that it may be a bit more complexity to implement it comparing to Strings-restriction solution, but it will definitely throw away all the work-arounds and suffering that users of avro have about it (and generally will lead to less complexity overall). So, in this case, IMHO, more is less :) I am really looking forward to feedback from community to discuss and rethink this restriction. Best regards, Michael Pershyn
