Andrzej Bialecki wrote:
Shouldn't such changes be reserved for major releases, i.e. for 0.7? Nutch relies heavily on UTF8 being the default, this change will make it more difficult to upgrade it to 0.6.2.

Good question. I think the intent was to switch as much as possible from UTF8 to Text in 0.6. Lots of things were switched, but these defaults were missed. So I was considering 0.6 the major release that contains the change from UTF8 to Text in public APIs.

Right now, in 0.6, the default input format is not consistent (TextInputFormat now returns Text, not UTF8). In our current monthly release strategy, the .0 releases are effectively alphas, candidates that sometimes are good enough to become the final release, and sometimes require point releases.

A consistent alternative might be to revert other places where UTF8 was changed to Text.

http://issues.apache.org/jira/browse/HADOOP-450 (TextInputFormat)
http://issues.apache.org/jira/browse/HADOOP-499 (contrib/streaming)
http://issues.apache.org/jira/browse/HADOOP-460 (smallJobsBenchmark)

So should we revert these in 0.6?

The patch of http://issues.apache.org/jira/browse/HADOOP-533 seemed like the simplest way to make 0.6 consistent.

I hate incompatible changes, but didn't see a way to make this change compatibly, yet it seems like a good change. What do you think?

Doug

Reply via email to