I agree it would be disruptive in the case that you outlined. This is
why we have release notes and semver, though.
I think this change should only go into a major release for downstream
stability. Even though how Accumulo creates and manages files is not
covered by our compatibility statement (it could), I don't feel like
it's worth trying to shoe-horn such a change into a minor release.
Adam Fuchs wrote:
We need to consider the scenario in which somebody has written an
application on Accumulo that uses the default compression codec. If we
change the default, their app's behavior will change when they upgrade
Accumulo, either because an existing table will start using snappy or
because their app creates new tables and those new tables will start using
snappy. This change could be disruptive, especially in the case that the
application developers have moved on to other projects and are no longer
available to fix the app. This is why I think you need a stronger argument
than snappy being better on average than gzip. I agree that snappy is
better on average, and even in most cases, but I'm not convinced we should
change the default.
Adam
On Aug 15, 2016 8:14 AM, "Josh Elser"<[email protected]> wrote:
No, I never asserted that Snappy is *always* the better choice. I would
say that I believe Snappy is better in *most cases*.
Most users I talk to (with and without Accumulo involved) have plenty of
disk space available to them. It is rare that space on disk is actually a
concern. Instead, performance is usually the primary metric of concern. To
be crystal clear, this is only my opinion on users I've talked to, not an
assertion on everyone.
I do not believe I need a better argument than "on average, we can make
out of the box performance better for most users". I suppose we'll have to
disagree on that point. Thanks for clarifying your opinions on the topic.
Adam Fuchs wrote:
If the crux of your argument was that snappy is always a better choice,
then my retort was to say it is not, since sometimes compression ratio can
be a dominant factor. Changes to defaults are disruptive for existing
users, so you need a better argument. I don't mean that you shouldn't
continue to debate the merits. By all means, do continue the conversation.
Adam
On Aug 13, 2016 8:39 PM, "Josh Elser"<[email protected]> wrote:
Your argument fails to address the performance benefits. I could pose the
same question back to you: you need to prove why we shouldn't use the
faster compression algorithm.
I don't mean to be snarky, but your argument is shutting down
conversation.
I appreciate you sharing the opinion but don't feel like it's encouraging
discussion.
On Aug 13, 2016 11:18 PM, "Adam Fuchs"<[email protected]> wrote:
In my experience gz gets roughly 1.5x to 2x better compression than
snappy.
Snappy is definitely not a pareto improvement (although we tend to use
snappy by default). Since it's not always better I think you would need
a
more solid argument to change the default.
Adam
On Aug 13, 2016 8:06 PM, "Josh Elser"<[email protected]> wrote:
Same motivation of using it as for making it the default. I am not
aware
of any downside to it. It's become pretty standard across all
installations
I've worked with for years.
Asking because I am no oracle on the matter. I could just be ignorant
of
some issue, but, given my current understanding, there is no downside
for
the average case.
Christopher wrote:
Sorry. I wasn't clear. I understand the motivation for using it...
I'm
asking about the motivation for making it the default.
Since both are available, I'm not sure the default matters *that*
much,
but
it could be an unexpected change for those preferring GZ.
Also, are there any risks regarding library availability of snappy?
GZ
is
pretty ubiquitous.
On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<[email protected]>
wrote:
Uhh, besides what I already mentioned? (close in compressed size but
"much" faster)
Christopher wrote:
What's the motivation for changing it?
On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<[email protected]>
wrote:
Any reason we don't want to do this? Last rule-of-thumb I heard was
that
snappy is often close enough in compression to GZ but quite a bit
faster
(I don't remember exactly how much).
- Josh