On Mon, Jul 2, 2012 at 7:10 PM, Harsh J <ha...@cloudera.com> wrote: > In addition to what Robert says, using a schema-based approach such as > Apache Avro can also help here. The schemas in Avro can evolve over > time if done right, while not breaking old readers. >
Thanks! Is there a good example of this that I can look at? > > On Tue, Jul 3, 2012 at 2:47 AM, Robert Evans <ev...@yahoo-inc.com> wrote: > > There are several different ways. One of the ways is to use something > > like Hcatalog to track the format and location of the dataset. This may > > be overkill for your problem, but it will grow with you. Another is to > > store the scheme with the data when it is written out. Your code may > need > > to the dynamically adjust to when the field is there and when it is not. > > > > --Bobby Evans > > > > On 7/2/12 4:09 PM, "Mohit Anchlia" <mohitanch...@gmail.com> wrote: > > > >>I am wondering what's the right way to go about designing reading input > >>and > >>output where file format may change over period. For instance we might > >>start with "field1,field2,field3" but at some point we add new field4 in > >>the input. What's the best way to deal with such scenarios? Keep a > catalog > >>of changes that timestamped? > > > > > > -- > Harsh J >