[ 
https://issues.apache.org/jira/browse/AVRO-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352338#comment-15352338
 ] 

ASF GitHub Bot commented on AVRO-1704:
--------------------------------------

GitHub user rdblue opened a pull request:

    https://github.com/apache/avro/pull/103

    AVRO-1704: Add DatumEncoder API

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rdblue/avro 
AVRO-1704-add-datum-encoder-decoder

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/avro/pull/103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #103
    
----
commit 79a2993151ea7589c06b854ee7ac8e951816ecce
Author: Ryan Blue <b...@apache.org>
Date:   2016-06-28T03:37:56Z

    AVRO-1869: Java: Fix Decimal conversion from ByteBuffer.

commit 3ca6a15ddf75e4c39468ddd1d454331f3f54f1e3
Author: Ryan Blue <b...@apache.org>
Date:   2016-06-28T03:40:14Z

    AVRO-1704: Java: Add type parameter to createDatumReader and Writer.

commit d91b90544f4486a72da8d3ff5b81dfc3c79d7c2f
Author: Ryan Blue <b...@apache.org>
Date:   2016-06-28T03:41:40Z

    AVRO-1704: Java: Add DatumEncoder and SchemaStore.

commit 7fa75aab405c6460077d7cc7e403c664cce84431
Author: Ryan Blue <b...@apache.org>
Date:   2016-06-28T03:44:06Z

    AVRO-1704: Java: Add toByteArray and fromByteArray to specific.

----


> Standardized format for encoding messages with Avro
> ---------------------------------------------------
>
>                 Key: AVRO-1704
>                 URL: https://issues.apache.org/jira/browse/AVRO-1704
>             Project: Avro
>          Issue Type: Improvement
>            Reporter: Daniel Schierbeck
>            Assignee: Niels Basjes
>         Attachments: AVRO-1704-2016-05-03-Unfinished.patch, 
> AVRO-1704-20160410.patch
>
>
> I'm currently using the Datafile format for encoding messages that are 
> written to Kafka and Cassandra. This seems rather wasteful:
> 1. I only encode a single record at a time, so there's no need for sync 
> markers and other metadata related to multi-record files.
> 2. The entire schema is inlined every time.
> However, the Datafile format is the only one that has been standardized, 
> meaning that I can read and write data with minimal effort across the various 
> languages in use in my organization. If there was a standardized format for 
> encoding single values that was optimized for out-of-band schema transfer, I 
> would much rather use that.
> I think the necessary pieces of the format would be:
> 1. A format version number.
> 2. A schema fingerprint type identifier, i.e. Rabin, MD5, SHA256, etc.
> 3. The actual schema fingerprint (according to the type.)
> 4. Optional metadata map.
> 5. The encoded datum.
> The language libraries would implement a MessageWriter that would encode 
> datums in this format, as well as a MessageReader that, given a SchemaStore, 
> would be able to decode datums. The reader would decode the fingerprint and 
> ask its SchemaStore to return the corresponding writer's schema.
> The idea is that SchemaStore would be an abstract interface that allowed 
> library users to inject custom backends. A simple, file system based one 
> could be provided out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to