I think the problem here is that the world of Hadoop tends to treat all
files as streams of pure data records. The file boundaries -- and hence any
per-file header or metadata -- doesn't have a meaning. It's unnatural-ish to
put metadata in data files in this land.

More specifically I don't quite see how you have a SequenceFile with one
header record of a different type? they all have to be of the same type.
Sure you can make a CommentOrDataWritable wrapper class but that's ugly.

On Fri, Aug 26, 2011 at 10:24 AM, Lance Norskog <[email protected]> wrote:

> If my camera uploaded raw image files and metadata files separately, I'd go
> mad. The sound sample people got this right 20 years ago, when I wrote SoX.
>
> The difference between throwaway data files and permanently archivable data
> files is having metadata inside the file.   Letting Mahout make a
> permanently archivable file opens up its utility tremendously, and
> self-description is the key.
>
>

Reply via email to