The data can not be said to be durable because there is one set of files that can be irreversibly corrupted or lost.
> On Apr 15, 2020, at 3:52 PM, Vladimir Rodionov <vladrodio...@gmail.com> wrote: > > FileOutputStream.getFileChannel().force(true) will get all durability we > need. Just a simple code change? > > >> On Wed, Apr 15, 2020 at 12:32 PM Andrew Purtell <andrew.purt...@gmail.com> >> wrote: >> >> This thread talks of “durability” via filesystem characteristics but also >> for single system quick Start type deployments. For durability we need >> multi server deployments. No amount of hacking a single system deployment >> is going to give us durability as users will expect (“don’t lose my data”). >> I believe my comments are on topic. >> >> >>>> On Apr 15, 2020, at 11:03 AM, Nick Dimiduk <ndimi...@apache.org> wrote: >>> >>> On Wed, Apr 15, 2020 at 10:28 AM Andrew Purtell <apurt...@apache.org> >> wrote: >>> >>>> Nick's mail doesn't make a distinction between avoiding data loss via >>>> typical tmp cleaner configurations, unfortunately adjacent to mention of >>>> "durability", and real data durability, which implies more than what a >>>> single system configuration can offer, no matter how many tweaks we >> make to >>>> LocalFileSystem. Maybe I'm being pedantic but this is something to be >>>> really clear about IMHO. >>>> >>> >>> I prefer to focus the attention of this thread to the question of data >>> durability via `FileSystem` characteristics. I agree that there are >>> concerns of durability (and others) around the use of the path under >> /tmp. >>> Let's keep that discussion in the other thread. >>> >>>> On Wed, Apr 15, 2020 at 10:05 AM Sean Busbey <bus...@apache.org> wrote: >>>> >>>>> I think the first assumption no longer holds. Especially with the move >>>>> to flexible compute environments I regularly get asked by folks what >>>>> the smallest HBase they can start with for production. I can keep >>>>> saying 3/5/7 nodes or whatever but I guarantee there are folks who >>>>> want to and will run HBase with a single node. Probably those >>>>> deployments won't want to have the distributed flag set. None of them >>>>> really have a good option for where the WALs go, and failing loud when >>>>> they try to go to LocalFileSystem is the best option I've seen so far >>>>> to make sure folks realize they are getting into muddy waters. >>>>> >>>>> I agree with the second assumption. Our quickstart in general is too >>>>> complicated. Maybe if we include big warnings in the guide itself, we >>>>> could make a quickstart specific artifact to download that has the >>>>> unsafe disabling config in place? >>>>> >>>>> Last fall I toyed with the idea of adding an "hbase-local" module to >>>>> the hbase-filesystem repo that could start us out with some >>>>> optimizations for single node set ups. We could start with a fork of >>>>> RawLocalFileSystem (which will call OutputStream flush operations in >>>>> response to hflush/hsync) that properly advertises its >>>>> StreamCapabilities to say that it supports the operations we need. >>>>> Alternatively we could make our own implementation of FileSystem that >>>>> uses NIO stuff. Either of these approaches would solve both problems. >>>>> >>>>> On Wed, Apr 15, 2020 at 11:40 AM Nick Dimiduk <ndimi...@apache.org> >>>> wrote: >>>>>> >>>>>> Hi folks, >>>>>> >>>>>> I'd like to bring up the topic of the experience of new users as it >>>>>> pertains to use of the `LocalFileSystem` and its associated (lack of) >>>>> data >>>>>> durability guarantees. By default, an unconfigured HBase runs with its >>>>> root >>>>>> directory on a `file:///` path. This patch is picked up as an instance >>>> of >>>>>> `LocalFileSystem`. Hadoop has long offered this class, but it has >> never >>>>>> supported `hsync` or `hflush` stream characteristics. Thus, when HBase >>>>> runs >>>>>> on this configuration, it is unable to ensure that WAL writes are >>>>> durable, >>>>>> and thus will ACK a write without this assurance. This is the case, >>>> even >>>>>> when running in a fully durable WAL mode. >>>>>> >>>>>> This impacts a new user, someone kicking the tires on HBase following >>>> our >>>>>> Getting Started docs. On Hadoop 2.8 and before, an unconfigured HBase >>>>> will >>>>>> WARN and cary on. Hadoop 2.10+, HBase will refuse to start. The book >>>>>> describes a process of disabling enforcement of stream capability >>>>>> enforcement as a first step. This is a mandatory configuration for >>>>> running >>>>>> HBase directly out of our binary distribution. >>>>>> >>>>>> HBASE-24086 restores the behavior on Hadoop 2.10+ to that of running >> on >>>>>> 2.8: log a warning and cary on. The critique of this approach is that >>>>> it's >>>>>> far too subtle, too quiet for a system operating in a state known to >>>> not >>>>>> provide data durability. >>>>>> >>>>>> I have two assumptions/concerns around the state of things, which >>>>> prompted >>>>>> my solution on HBASE-24086 and the associated doc update on >>>> HBASE-24106. >>>>>> >>>>>> 1. No one should be running a production system on `LocalFileSystem`. >>>>>> >>>>>> The initial implementation checked both for `LocalFileSystem` and >>>>>> `hbase.cluster.distributed`. When running on the former and the latter >>>> is >>>>>> false, we assume the user is running a non-production deployment and >>>>> carry >>>>>> on with the warning. When the latter is true, we assume the user >>>>> intended a >>>>>> production deployment and the process terminates due to stream >>>> capability >>>>>> enforcement. Subsequent code review resulted in skipping the >>>>>> `hbase.cluster.distributed` check and simply warning, as was done on >>>> 2.8 >>>>>> and earlier. >>>>>> >>>>>> (As I understand it, we've long used the `hbase.cluster.distributed` >>>>>> configuration to decide if the user intends this runtime to be a >>>>> production >>>>>> deployment or not.) >>>>>> >>>>>> Is this a faulty assumption? Is there a use-case we support where we >>>>>> condone running production deployment on the non-durable >>>>> `LocalFileSystem`? >>>>>> >>>>>> 2. The Quick Start experience should require no configuration at all. >>>>>> >>>>>> Our stack is difficult enough to run in a fully durable production >>>>>> environment. We should make it a priority to ensure it's as easy as >>>>>> possible to try out HBase. Forcing a user to make decisions about data >>>>>> durability before they even launch the web ui is a terrible >> experience, >>>>> in >>>>>> my opinion, and should be a non-starter for us as a project. >>>>>> >>>>>> (In my opinion, the need to configure either `hbase.rootdir` or >>>>>> `hbase.tmp.dir` away from `/tmp` is equally bad for a Getting Started >>>>>> experience. It is a second, more subtle question of data durability >>>> that >>>>> we >>>>>> should avoid out of the box. But I'm happy to leave that for another >>>>>> thread.) >>>>>> >>>>>> Thank you for your time, >>>>>> Nick >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Andrew >>>> >>>> Words like orphans lost among the crosstalk, meaning torn from truth's >>>> decrepit hands >>>> - A23, Crosstalk >>>> >>