I think I like this idea. The configuration templates we have, along
with bin/bootstrap_config.sh, would make this easily consumable, IMO.
I'm not sure how many users know about/use that script though. I know
that I personally still copy out of conf/examples/3GB-native and modify
to suit my whims at the moment.
Marc P. wrote:
Perhaps there is a happy medium, though, by not necessarily defining
example configurations by the size of your memory footprint, but instead by
performance configuration? Snappy could be the default for those who want a
faster but less space cognizant implementation. Christopher's concerns
would be allayed, and perhaps those who try Accumulo may get better
performance by using Snappy?
On Sat, Aug 13, 2016 at 11:19 PM, Christopher<[email protected]> wrote:
Native libraries for snappy are also not typically installed by default on
Linux distros. Even if the hadoop native libraries are installed, the user
is likely going to end up using the Java implementation by default, I
*think*, unless they take additional actions.
On Sat, Aug 13, 2016 at 11:18 PM Adam Fuchs<[email protected]> wrote:
In my experience gz gets roughly 1.5x to 2x better compression than
snappy.
Snappy is definitely not a pareto improvement (although we tend to use
snappy by default). Since it's not always better I think you would need a
more solid argument to change the default.
Adam
On Aug 13, 2016 8:06 PM, "Josh Elser"<[email protected]> wrote:
Same motivation of using it as for making it the default. I am not
aware
of any downside to it. It's become pretty standard across all
installations
I've worked with for years.
Asking because I am no oracle on the matter. I could just be ignorant
of
some issue, but, given my current understanding, there is no downside
for
the average case.
Christopher wrote:
Sorry. I wasn't clear. I understand the motivation for using it... I'm
asking about the motivation for making it the default.
Since both are available, I'm not sure the default matters *that*
much,
but
it could be an unexpected change for those preferring GZ.
Also, are there any risks regarding library availability of snappy? GZ
is
pretty ubiquitous.
On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<[email protected]>
wrote:
Uhh, besides what I already mentioned? (close in compressed size but
"much" faster)
Christopher wrote:
What's the motivation for changing it?
On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<[email protected]>
wrote:
Any reason we don't want to do this? Last rule-of-thumb I heard was
that
snappy is often close enough in compression to GZ but quite a bit
faster
(I don't remember exactly how much).
- Josh