a) On the PB dependency.. can't we just use JSON and call it a day? I mean, we're gonna have a new dependency so that we can encode a single tuple? That doesn't even make engineering sense, let alone that the choice of PB looks like a deliberate decision to try and tweak Doug's nose, whether that was the intention or not. Even if you could make a case for some very minor benefit of using PB instead of one of the 3 serialization methods already on the classpath, it's hard to see why it's worth going to the mat over it. And again, as a user, every additional classpath element in Hadoop is a potential future conflict that I'll have to sort out for some non-exciting business process I'm writing.
b) Agreeing with Eric.. backwards compatibility is essential for sequence file. It seems to me that past a certain point, it's easier to just make a new file format rather than cramming further functionality and backwards-compatibility layers into the SequenceFile class, but as long as it's backward compatible then I'm sure people will be fine. On Tue, Dec 7, 2010 at 10:55 AM, Arun C Murthy <[email protected]> wrote: > > On Dec 6, 2010, at 7:36 PM, Eric Sammer wrote: > > I'm a bit confused as to how this equates with sequence files being >> deprecated or arrested. I tried to read HADOOP-6685 but there's a lot >> of internal references and context I feel like I'm missing. Suffice it >> to say, sequence files can *not* be broken for existing data for the >> reasons everyone has stated. If we choose to focus development >> elsewhere ("soft deprecate") or actively encourage users elsewhere >> ("@Deprecated") is an issue I think we can sever from this discussion. >> > > I'm surprised that your are confused. > > http://s.apache.org/h6685-veto > > Doug is very clear that he is vetoing the patch based on 2 reasons: > a) dependency on PB > b) extension to SequenceFile > > a) is technical, and we can debate about it. > > b) isn't. It's his 'vision' for the project, a vision which hasn't been > ratified by the PMC. > > Arun
