I like the idea of making snappy the default. However, I am concerned about raising the barrier of entry to new users by adding yet another dependency to install.
On Mon, Aug 15, 2016 at 11:13 AM, Josh Elser <[email protected]> wrote: > No, I never asserted that Snappy is *always* the better choice. I would > say that I believe Snappy is better in *most cases*. > > Most users I talk to (with and without Accumulo involved) have plenty of > disk space available to them. It is rare that space on disk is actually a > concern. Instead, performance is usually the primary metric of concern. To > be crystal clear, this is only my opinion on users I've talked to, not an > assertion on everyone. > > I do not believe I need a better argument than "on average, we can make > out of the box performance better for most users". I suppose we'll have to > disagree on that point. Thanks for clarifying your opinions on the topic. > > > Adam Fuchs wrote: > >> If the crux of your argument was that snappy is always a better choice, >> then my retort was to say it is not, since sometimes compression ratio can >> be a dominant factor. Changes to defaults are disruptive for existing >> users, so you need a better argument. I don't mean that you shouldn't >> continue to debate the merits. By all means, do continue the conversation. >> >> Adam >> >> On Aug 13, 2016 8:39 PM, "Josh Elser"<[email protected]> wrote: >> >>> Your argument fails to address the performance benefits. I could pose the >>> same question back to you: you need to prove why we shouldn't use the >>> faster compression algorithm. >>> >>> I don't mean to be snarky, but your argument is shutting down >>> >> conversation. >> >>> I appreciate you sharing the opinion but don't feel like it's encouraging >>> discussion. >>> >>> On Aug 13, 2016 11:18 PM, "Adam Fuchs"<[email protected]> wrote: >>> >>> In my experience gz gets roughly 1.5x to 2x better compression than >>>> >>> snappy. >> >>> Snappy is definitely not a pareto improvement (although we tend to use >>>> snappy by default). Since it's not always better I think you would need >>>> >>> a >> >>> more solid argument to change the default. >>>> >>>> Adam >>>> >>>> On Aug 13, 2016 8:06 PM, "Josh Elser"<[email protected]> wrote: >>>> >>>> Same motivation of using it as for making it the default. I am not >>>>> >>>> aware >> >>> of any downside to it. It's become pretty standard across all >>>>> >>>> installations >>>> >>>>> I've worked with for years. >>>>> >>>>> Asking because I am no oracle on the matter. I could just be ignorant >>>>> >>>> of >> >>> some issue, but, given my current understanding, there is no downside >>>>> >>>> for >> >>> the average case. >>>>> >>>>> Christopher wrote: >>>>> >>>>> Sorry. I wasn't clear. I understand the motivation for using it... >>>>>> >>>>> I'm >> >>> asking about the motivation for making it the default. >>>>>> >>>>>> Since both are available, I'm not sure the default matters *that* >>>>>> >>>>> much, >> >>> but >>>>>> it could be an unexpected change for those preferring GZ. >>>>>> >>>>>> Also, are there any risks regarding library availability of snappy? >>>>>> >>>>> GZ >> >>> is >>>> >>>>> pretty ubiquitous. >>>>>> >>>>>> On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<[email protected]> >>>>>> >>>>> wrote: >>>> >>>>> Uhh, besides what I already mentioned? (close in compressed size but >>>>>> >>>>>>> "much" faster) >>>>>>> >>>>>>> Christopher wrote: >>>>>>> >>>>>>> What's the motivation for changing it? >>>>>>>> >>>>>>>> On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<[email protected]> >>>>>>>> >>>>>>>> wrote: >>>>>>> >>>>>>> Any reason we don't want to do this? Last rule-of-thumb I heard was >>>>>>>> >>>>>>> that >>>> >>>>> snappy is often close enough in compression to GZ but quite a bit >>>>>>>>> faster >>>>>>>>> (I don't remember exactly how much). >>>>>>>>> >>>>>>>>> - Josh >>>>>>>>> >>>>>>>>> >>>>>>>>> >>
