[
https://issues.apache.org/jira/browse/AVRO-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353906#comment-15353906
]
Doug Cutting commented on AVRO-1704:
------------------------------------
I think all the methods are useful but some of them (e.g., non-reuse) will
always be implemented by boilerplate and are thus not core to the interface,
but rather something more suitable for a base class.
An abstract base class would still permit independent alternative
implementations. The only additional power an interface has is that one can
implement multiple interfaces. But interfaces don't let you implement
convenience methods, nor do they permit compatible evolution (if you ever add
or remove a method, you break implementations, because you cannot provide
default impls). But if you feel multiple inheritance is important here, then
it's probably easier to stick to an interface than, e.g., refactor into
encoder/decoder provider classes that are separate from the user-invoked
classes or some other way to avoid such boilerplate implementations.
Encoding to a ByteBuffer should be thread-safe, since it has no caller-visible
state, no?
> Standardized format for encoding messages with Avro
> ---------------------------------------------------
>
> Key: AVRO-1704
> URL: https://issues.apache.org/jira/browse/AVRO-1704
> Project: Avro
> Issue Type: Improvement
> Reporter: Daniel Schierbeck
> Assignee: Niels Basjes
> Attachments: AVRO-1704-2016-05-03-Unfinished.patch,
> AVRO-1704-20160410.patch
>
>
> I'm currently using the Datafile format for encoding messages that are
> written to Kafka and Cassandra. This seems rather wasteful:
> 1. I only encode a single record at a time, so there's no need for sync
> markers and other metadata related to multi-record files.
> 2. The entire schema is inlined every time.
> However, the Datafile format is the only one that has been standardized,
> meaning that I can read and write data with minimal effort across the various
> languages in use in my organization. If there was a standardized format for
> encoding single values that was optimized for out-of-band schema transfer, I
> would much rather use that.
> I think the necessary pieces of the format would be:
> 1. A format version number.
> 2. A schema fingerprint type identifier, i.e. Rabin, MD5, SHA256, etc.
> 3. The actual schema fingerprint (according to the type.)
> 4. Optional metadata map.
> 5. The encoded datum.
> The language libraries would implement a MessageWriter that would encode
> datums in this format, as well as a MessageReader that, given a SchemaStore,
> would be able to decode datums. The reader would decode the fingerprint and
> ask its SchemaStore to return the corresponding writer's schema.
> The idea is that SchemaStore would be an abstract interface that allowed
> library users to inject custom backends. A simple, file system based one
> could be provided out of the box.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)