Andrzej Bialecki wrote:
Shouldn't such changes be reserved for major releases, i.e. for 0.7?
Nutch relies heavily on UTF8 being the default, this change will make it
more difficult to upgrade it to 0.6.2.
Good question. I think the intent was to switch as much as possible
from UTF8 to Text in 0.6. Lots of things were switched, but these
defaults were missed. So I was considering 0.6 the major release that
contains the change from UTF8 to Text in public APIs.
Right now, in 0.6, the default input format is not consistent
(TextInputFormat now returns Text, not UTF8). In our current monthly
release strategy, the .0 releases are effectively alphas, candidates
that sometimes are good enough to become the final release, and
sometimes require point releases.
A consistent alternative might be to revert other places where UTF8 was
changed to Text.
http://issues.apache.org/jira/browse/HADOOP-450 (TextInputFormat)
http://issues.apache.org/jira/browse/HADOOP-499 (contrib/streaming)
http://issues.apache.org/jira/browse/HADOOP-460 (smallJobsBenchmark)
So should we revert these in 0.6?
The patch of http://issues.apache.org/jira/browse/HADOOP-533 seemed like
the simplest way to make 0.6 consistent.
I hate incompatible changes, but didn't see a way to make this change
compatibly, yet it seems like a good change. What do you think?
Doug