[ https://issues.apache.org/jira/browse/AVRO-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352338#comment-15352338 ]
ASF GitHub Bot commented on AVRO-1704: -------------------------------------- GitHub user rdblue opened a pull request: https://github.com/apache/avro/pull/103 AVRO-1704: Add DatumEncoder API You can merge this pull request into a Git repository by running: $ git pull https://github.com/rdblue/avro AVRO-1704-add-datum-encoder-decoder Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/103.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #103 ---- commit 79a2993151ea7589c06b854ee7ac8e951816ecce Author: Ryan Blue <b...@apache.org> Date: 2016-06-28T03:37:56Z AVRO-1869: Java: Fix Decimal conversion from ByteBuffer. commit 3ca6a15ddf75e4c39468ddd1d454331f3f54f1e3 Author: Ryan Blue <b...@apache.org> Date: 2016-06-28T03:40:14Z AVRO-1704: Java: Add type parameter to createDatumReader and Writer. commit d91b90544f4486a72da8d3ff5b81dfc3c79d7c2f Author: Ryan Blue <b...@apache.org> Date: 2016-06-28T03:41:40Z AVRO-1704: Java: Add DatumEncoder and SchemaStore. commit 7fa75aab405c6460077d7cc7e403c664cce84431 Author: Ryan Blue <b...@apache.org> Date: 2016-06-28T03:44:06Z AVRO-1704: Java: Add toByteArray and fromByteArray to specific. ---- > Standardized format for encoding messages with Avro > --------------------------------------------------- > > Key: AVRO-1704 > URL: https://issues.apache.org/jira/browse/AVRO-1704 > Project: Avro > Issue Type: Improvement > Reporter: Daniel Schierbeck > Assignee: Niels Basjes > Attachments: AVRO-1704-2016-05-03-Unfinished.patch, > AVRO-1704-20160410.patch > > > I'm currently using the Datafile format for encoding messages that are > written to Kafka and Cassandra. This seems rather wasteful: > 1. I only encode a single record at a time, so there's no need for sync > markers and other metadata related to multi-record files. > 2. The entire schema is inlined every time. > However, the Datafile format is the only one that has been standardized, > meaning that I can read and write data with minimal effort across the various > languages in use in my organization. If there was a standardized format for > encoding single values that was optimized for out-of-band schema transfer, I > would much rather use that. > I think the necessary pieces of the format would be: > 1. A format version number. > 2. A schema fingerprint type identifier, i.e. Rabin, MD5, SHA256, etc. > 3. The actual schema fingerprint (according to the type.) > 4. Optional metadata map. > 5. The encoded datum. > The language libraries would implement a MessageWriter that would encode > datums in this format, as well as a MessageReader that, given a SchemaStore, > would be able to decode datums. The reader would decode the fingerprint and > ask its SchemaStore to return the corresponding writer's schema. > The idea is that SchemaStore would be an abstract interface that allowed > library users to inject custom backends. A simple, file system based one > could be provided out of the box. -- This message was sent by Atlassian JIRA (v6.3.4#6332)