Just wanted to chime in here as a user - I've run into similar behavior from CouchDB with the reduce-not-reducing-enough heuristic, where stuff I was working on went smoothly in dev, but stopped once real load was pushed through it (thankfully for me, that was in testing, rather than released to customers).
It's a frustrating experience, and I don't think that a reputation for "works until you cross a threshold, and then it doesn't, but only in production" is a good thing to move towards. Perhaps something like adding a key to the returned data along the lines of "_slow_warning": "This query is going to be slow on large data sets. See http://..." in addition to the ?slow_warning=true query param (note that I'm calling it "slow_warning" in both places only to increase discoverability; without the url param, the no-index query wouldn't work at all). Bikeshed the name as needed. I'd like to see a lot more URLs in CouchDB error messages in general, actually - I would find it very useful when trying to determine what's going wrong to have a URL right there in the logs that I can get more information from. On Sun, Jan 10, 2016 at 11:54 AM, Joan Touzet <[email protected]> wrote: > Hi Robert, > > I've been thinking about this one for the week or so, and I have a > simple suggestion: > > Add the query parameter slow=true to enable this behaviour. > > This meets all the original requirements: > > 1. It is not default behaviour > 2. You can grep the log files for the word 'slow' and find evidence > 3. There is a shorthand, simple way to enable the behaviour > 4. Any self-respecting developer will try to remove slow=true, find > a break, and be forced to learn about indexes > 5. It's a bit cheeky, which I think is kind of fun :D > > All the best, > Joan > > ----- Original Message ----- >> From: "William Edney" <[email protected]> >> To: [email protected] >> Sent: Friday, January 8, 2016 10:27:29 AM >> Subject: Re: [POC] Mango Catch All Selector >> >> Hi Robert - >> >> As a builder of UI, API and library code who has also done developer >> training on a variety of technologies, one simple fix might be go >> ahead and >> not require indexes to be built, but then to put a big NOTE at the >> beginning of the "Mango Getting Started" guide (I would assume there >> is >> such a piece of documentation) that states: "Note that the examples >> in this >> document do not require you to build an index, but for performance >> reasons >> we HIGHLY RECOMMEND that you do so. *Click here* for more information >> about >> how to do that" (or some such verbiage). >> >> My 2 cents. >> >> Cheers, >> >> - Bill >> >> On Fri, Jan 8, 2016 at 9:04 AM, Robert Kowalski <[email protected]> >> wrote: >> >> > Hi list, >> > >> > At the end of the mail I would like to invite the other folks from >> > the >> > mailing list that build interfaces for humans (APIs, CLIs or even >> > UIs) >> > to chime in again with their opinions. So all people one the ML, >> > the >> > mail is not just a response to Paul, feedback is welcome :) >> > >> > Hi Paul, I agree with the timeout. It could lead to very unpleasant >> > errors which are hard to debug and support. >> > >> > I added some thoughts to the other points you made: >> > >> > > a) know that the slow queries logs exist, >> > >> > Hmm... If I take a look at the 1.x logging it was very >> > straightforward. As a developer you would spin up a CouchDB and you >> > get all the log messages into your terminal. It was quite handy in >> > general for all kind of debugging. That the logs are not displayed >> > directly on stdout/stderr is in my opinion a general 2.x problem. >> > The >> > problem does occur with all kinds of log message we produce in >> > CouchDB >> > for 2.x and is not specific to the slow-query-logging. >> > >> > >> > > Ie, "You can try queries with testing:true, when you're ready to >> > > move to >> > production you can >> > > POST your selector to _index to create the index which allows you >> > > to >> > > remove testing:true". >> > >> > I really like the migration path you mentioned here with the API to >> > create indexes. I am worried to have a too high entry barrier for >> > absolute newcomers, people that you want to play around before they >> > are ready to think about indexes, e.g. by putting coupling the >> > index >> > topic from the beginning to the querying. >> > >> > When I throw too much things to learn on people (which may not >> > have >> > used a database before), most people get discouraged and does not >> > take >> > a look. The usual things they feel or say are : "too complicated", >> > "I >> > have not enough time", "product XY is easier to use". >> > >> > I would argue that newcomers to a database will launch a high >> > traffic, >> > multi-gigabyte product with the database from day one. Day one is >> > the >> > day where they learn how to query the data and put data into the >> > database. Even for scenarios where people have a running high >> > traffic >> > system, and have used other databases at a medium to large scale I >> > would expect given they migrate to Couch, that they run both >> > systems >> > in parallel for the first time in order to fix the issues that >> > occur >> > during a migration. >> > >> > I think we we share the same goal (getting beginners started >> > quickly) >> > and the cool thing about your suggestion is that everyone gets the >> > required knowledge to run a production system right from the very >> > start. My suggestion leaves some parts out, but reduces the >> > cognitive >> > load required to get the very first basic results, e.g. in a >> > university class setting - or junior developers on their "casual >> > friday 20% time". My big hope is, once those folks build high >> > traffic >> > systems, they remember how easy the usage of CouchDB was and that >> > they >> > start to learn more about CouchDB in order to run it in a system >> > with >> > more than a few thousand documents. >> > >> > >> > For us both I think the "what" is clear, but the "how" is a bit >> > different. I also think this discussion still makes progress, but I >> > am >> > afraid it could stall. I see that we both have very good rudiments >> > and >> > I would like to invite the other folks from the mailing list that >> > build interfaces for humans (APIs, CLIs or even UIs) to chime in >> > again >> > with their opinions - of course I'm also looking forward to your >> > answer :) >> > >> > Best, >> > Robert :) >> > >> > On Wed, Jan 6, 2016 at 6:21 PM, Paul Davis >> > <[email protected]> >> > wrote: >> > >>> - is a timeout solving the root cause or the symptoms? Could it >> > >>> be a >> > >>> temporary or additional step as in conjunction with query >> > >>> optimisation >> > >>> tooling? >> > >> >> > >> It really depends. From my CouchDB admin and user perspective, >> > >> this >> > >> doesn't seem so important to me right now. However, I recognize >> > >> that >> > >> there are different usage scenarios with different requirents >> > >> (e.g. the >> > >> ones at Cloudant). >> > > >> > > I don't think there's anything special about Cloudant in this >> > > discussion. Its just a question of how do we allow new users the >> > > ability to easily test and learn the selector/query API while >> > > also >> > > preventing them from going too far without creating indexes for >> > > their >> > > queries. The slow queries messages are fine, but just as any >> > > other >> > > database they don't really prompt the developer to make the >> > > correct >> > > change. Ie, the developer has to be savvy enough to a) know that >> > > the >> > > slow queries logs exist, b) understand that creating an index >> > > would >> > > speed things up, and then c) know which index to create based on >> > > the >> > > logged query. >> > > >> > > In my experience, the group of users that we're concerned about >> > > in >> > > this discussion most likely don't know about any of those three >> > > things, hence why the current API is designed to force them to >> > > learn >> > > about and understand indexes as part of learning the API. Granted >> > > the >> > > `_id > null` trick muddies that learning process. I would think >> > > that >> > > replacing the _id trick with `"testing": true` or similar would >> > > be an >> > > obvious indication to users that this is a dev/debug type feature >> > > and >> > > when they went to production they would still be pushed to using >> > > an >> > > index. If we add the "create index from selector" API then I >> > > think >> > > this would be a relatively straightforward method to on ramping >> > > to >> > > both the query and index sides of the API. Ie, "You can try >> > > queries >> > > with testing:true, when you're ready to move to production you >> > > can >> > > POST your selector to _index to create the index which allows you >> > > to >> > > remove testing:true". >> > > >> > > That's also why I don't particularly care for the timeout >> > > approach. >> > > It's a binary threshold that a user would (maybe) meet after some >> > > unknown amount of time after they falsely believe their app is >> > > working >> > > correctly. The feedback is "Everything is fine until it isn't". >> > > Consider an app that's been working for a week or a month or more >> > > that >> > > suddenly starts throwing timeouts for a query. From the user's >> > > perspective the database broke because the query that used to >> > > work >> > > fine no longer does. And then there's the follow on question on >> > > how >> > > that timeout might instruct the user that they need an index, and >> > > that >> > > the fix may be as easy as POSTing their selector to the _index >> > > endpoint. Sure Google would most likely have the answer if our >> > > docs >> > > are good enough, but by that point the developer is probably >> > > already >> > > experiencing downtime if their app is live which means they're >> > > frantically trying to fix the thing. From my point of view, a few >> > > road >> > > blocks that guide developers towards the correct usage early on >> > > would >> > > be better than letting them get to the adrenaline fueled >> > > expletive >> > > fountain of downtime. >> > >>
