Hi Doug, thanks for your response - yeah i had worked it out. However I felt
there was a need for a SeekableByteArrayInput - I filed a JIRA (
http://issues.apache.org/jira/browse/AVRO-126) and submitted a patch. That
was really useful when storing things in Voldemort - in the case of a K/V
store, it may be overkill to always store the schema along...

Thanks,
Florian

On Mon, Sep 28, 2009 at 12:14 PM, Doug Cutting <[email protected]> wrote:

> Florian Leibert wrote:
>
>> I just figured out that I can just use the GenericDatumWriter instead of
>> the DataFileWriter - the former doesn't store the schema in the file while
>> the latter does.
>>
>
> Florian,
>
> It sounds like you worked this one out for yourself.  Different DatumWriter
> implementations encode equivalent data identically.  They differ in how the
> data is represented in Java, not when serialized.
>
> The best practice with Avro is to store the schema with serialized data, so
> that later, even if the schema in your application has changed, you can
> still read that data.  Avro's data file stores the schema once per file.
>  Avro RPC clients pass the MD5 hash of their schema with each request, and,
> when a server has not seen that version of the schema, the client must
> resubmit the request with the full schema.  If you're, e.g., potentially
> storing different versions of a record in a database, then you might
> consider annotating each entry with the hash of its schema and separately
> maintaining a table mapping hashes to schemas, so that applications can
> always find the schema that was used to write the data when processing it.
>
> I hope this helps!
>
> Cheers,
>
> Doug
>

Reply via email to