I think the problem here is that the world of Hadoop tends to treat all files as streams of pure data records. The file boundaries -- and hence any per-file header or metadata -- doesn't have a meaning. It's unnatural-ish to put metadata in data files in this land.
More specifically I don't quite see how you have a SequenceFile with one header record of a different type? they all have to be of the same type. Sure you can make a CommentOrDataWritable wrapper class but that's ugly. On Fri, Aug 26, 2011 at 10:24 AM, Lance Norskog <[email protected]> wrote: > If my camera uploaded raw image files and metadata files separately, I'd go > mad. The sound sample people got this right 20 years ago, when I wrote SoX. > > The difference between throwaway data files and permanently archivable data > files is having metadata inside the file. Letting Mahout make a > permanently archivable file opens up its utility tremendously, and > self-description is the key. > >
