Re: Dealing with changing file format

Mohit Anchlia Mon, 02 Jul 2012 21:40:37 -0700

On Mon, Jul 2, 2012 at 7:10 PM, Harsh J <ha...@cloudera.com> wrote:

> In addition to what Robert says, using a schema-based approach such as
> Apache Avro can also help here. The schemas in Avro can evolve over
> time if done right, while not breaking old readers.
>


Thanks! Is there a good example of this that I can look at?

>
> On Tue, Jul 3, 2012 at 2:47 AM, Robert Evans <ev...@yahoo-inc.com> wrote:
> > There are several different ways.  One of the ways is to use something
> > like Hcatalog to track the format and location of the dataset.  This may
> > be overkill for your problem, but it will grow with you.  Another is to
> > store the scheme with the data when it is written out.  Your code may
> need
> > to the dynamically adjust to when the field is there and when it is not.
> >
> > --Bobby Evans
> >
> > On 7/2/12 4:09 PM, "Mohit Anchlia" <mohitanch...@gmail.com> wrote:
> >
> >>I am wondering what's the right way to go about designing reading input
> >>and
> >>output where file format may change over period. For instance we might
> >>start with "field1,field2,field3" but at some point we add new field4 in
> >>the input. What's the best way to deal with such scenarios? Keep a
> catalog
> >>of changes that timestamped?
> >
>
>
>
> --
> Harsh J
>

Re: Dealing with changing file format

Reply via email to