Just wanted to chime in here as a user - I've run into similar
behavior from CouchDB with the reduce-not-reducing-enough heuristic,
where stuff I was working on went smoothly in dev, but stopped once
real load was pushed through it (thankfully for me, that was in
testing, rather than released to customers).

It's a frustrating experience, and I don't think that a reputation for
"works until you cross a threshold, and then it doesn't, but only in
production" is a good thing to move towards.

Perhaps something like adding a key to the returned data along the
lines of "_slow_warning": "This query is going to be slow on large
data sets. See http://..."; in addition to the ?slow_warning=true query
param (note that I'm calling it "slow_warning" in both places only to
increase discoverability; without the url param, the no-index query
wouldn't work at all). Bikeshed the name as needed.

I'd like to see a lot more URLs in CouchDB error messages in general,
actually - I would find it very useful when trying to determine what's
going wrong to have a URL right there in the logs that I can get more
information from.

On Sun, Jan 10, 2016 at 11:54 AM, Joan Touzet <[email protected]> wrote:
> Hi Robert,
>
> I've been thinking about this one for the week or so, and I have a
> simple suggestion:
>
>   Add the query parameter slow=true to enable this behaviour.
>
> This meets all the original requirements:
>
> 1. It is not default behaviour
> 2. You can grep the log files for the word 'slow' and find evidence
> 3. There is a shorthand, simple way to enable the behaviour
> 4. Any self-respecting developer will try to remove slow=true, find
>    a break, and be forced to learn about indexes
> 5. It's a bit cheeky, which I think is kind of fun :D
>
> All the best,
> Joan
>
> ----- Original Message -----
>> From: "William Edney" <[email protected]>
>> To: [email protected]
>> Sent: Friday, January 8, 2016 10:27:29 AM
>> Subject: Re: [POC] Mango Catch All Selector
>>
>> Hi Robert -
>>
>> As a builder of UI, API and library code who has also done developer
>> training on a variety of technologies, one simple fix might be go
>> ahead and
>> not require indexes to be built, but then to put a big NOTE at the
>> beginning of the "Mango Getting Started" guide (I would assume there
>> is
>> such a piece of documentation) that states: "Note that the examples
>> in this
>> document do not require you to build an index, but for performance
>> reasons
>> we HIGHLY RECOMMEND that you do so. *Click here* for more information
>> about
>> how to do that" (or some such verbiage).
>>
>> My 2 cents.
>>
>> Cheers,
>>
>> - Bill
>>
>> On Fri, Jan 8, 2016 at 9:04 AM, Robert Kowalski <[email protected]>
>> wrote:
>>
>> > Hi list,
>> >
>> > At the end of the mail I would like to invite the other folks from
>> > the
>> > mailing list that build interfaces for humans (APIs, CLIs or even
>> > UIs)
>> > to chime in again with their opinions. So all people one the ML,
>> > the
>> > mail is not just a response to Paul, feedback is welcome :)
>> >
>> > Hi Paul, I agree with the timeout. It could lead to very unpleasant
>> > errors which are hard to debug and support.
>> >
>> > I added some thoughts to the other points you made:
>> >
>> > > a) know that the slow queries logs exist,
>> >
>> > Hmm... If I take a look at the 1.x logging it was very
>> > straightforward. As a developer you would spin up a CouchDB and you
>> > get all the log messages into your terminal. It was quite handy in
>> > general for all kind of debugging. That the logs are not displayed
>> > directly on stdout/stderr is in my opinion a general 2.x problem.
>> > The
>> > problem does occur with all kinds of log message we produce in
>> > CouchDB
>> > for 2.x and is not specific to the slow-query-logging.
>> >
>> >
>> > > Ie, "You can try queries with testing:true, when you're ready to
>> > > move to
>> > production you can
>> > > POST your selector to _index to create the index which allows you
>> > > to
>> > > remove testing:true".
>> >
>> > I really like the migration path you mentioned here with the API to
>> > create indexes. I am worried to have a too high entry barrier for
>> > absolute newcomers, people that you want to play around before they
>> > are ready to think about indexes, e.g. by putting coupling the
>> > index
>> > topic from the beginning to the querying.
>> >
>> > When I throw too much things to learn on people (which  may not
>> > have
>> > used a database before), most people get discouraged and does not
>> > take
>> > a look. The usual things they feel or say are : "too complicated",
>> > "I
>> > have not enough time", "product XY is easier to use".
>> >
>> > I would argue that newcomers to a database will launch a high
>> > traffic,
>> > multi-gigabyte product with the database from day one. Day one is
>> > the
>> > day where they learn how to query the data and put data into the
>> > database. Even for scenarios where people have a running high
>> > traffic
>> > system, and have used other databases at a medium to large scale I
>> > would expect given they migrate to Couch, that they run both
>> > systems
>> > in parallel for the first time in order to fix the issues that
>> > occur
>> > during a migration.
>> >
>> > I think we we share the same goal (getting beginners started
>> > quickly)
>> > and the cool thing about your suggestion is that everyone gets the
>> > required knowledge to run a production system right from the very
>> > start. My suggestion leaves some parts out, but reduces the
>> > cognitive
>> > load required to get the very first basic results, e.g. in a
>> > university class setting - or junior developers on their "casual
>> > friday 20% time". My big hope is, once those folks build high
>> > traffic
>> > systems, they remember how easy the usage of CouchDB was and that
>> > they
>> > start to learn more about CouchDB in order to run it in a system
>> > with
>> > more than a few thousand documents.
>> >
>> >
>> > For us both I think the "what" is clear, but the "how" is a bit
>> > different. I also think this discussion still makes progress, but I
>> > am
>> > afraid it could stall. I see that we both have very good rudiments
>> > and
>> > I would like to invite the other folks from the mailing list that
>> > build interfaces for humans (APIs, CLIs or even UIs) to chime in
>> > again
>> > with their opinions - of course I'm also looking forward to your
>> > answer :)
>> >
>> > Best,
>> > Robert :)
>> >
>> > On Wed, Jan 6, 2016 at 6:21 PM, Paul Davis
>> > <[email protected]>
>> > wrote:
>> > >>> - is a timeout solving the root cause or the symptoms? Could it
>> > >>> be a
>> > >>> temporary or additional step as in conjunction with query
>> > >>> optimisation
>> > >>> tooling?
>> > >>
>> > >> It really depends. From my CouchDB admin and user perspective,
>> > >> this
>> > >> doesn't seem so important to me right now. However, I recognize
>> > >> that
>> > >> there are different usage scenarios with different requirents
>> > >> (e.g. the
>> > >> ones at Cloudant).
>> > >
>> > > I don't think there's anything special about Cloudant in this
>> > > discussion. Its just a question of how do we allow new users the
>> > > ability to easily test and learn the selector/query API while
>> > > also
>> > > preventing them from going too far without creating indexes for
>> > > their
>> > > queries. The slow queries messages are fine, but just as any
>> > > other
>> > > database they don't really prompt the developer to make the
>> > > correct
>> > > change. Ie, the developer has to be savvy enough to a) know that
>> > > the
>> > > slow queries logs exist, b) understand that creating an index
>> > > would
>> > > speed things up, and then c) know which index to create based on
>> > > the
>> > > logged query.
>> > >
>> > > In my experience, the group of users that we're concerned about
>> > > in
>> > > this discussion most likely don't know about any of those three
>> > > things, hence why the current API is designed to force them to
>> > > learn
>> > > about and understand indexes as part of learning the API. Granted
>> > > the
>> > > `_id > null` trick muddies that learning process. I would think
>> > > that
>> > > replacing the _id trick with `"testing": true` or similar would
>> > > be an
>> > > obvious indication to users that this is a dev/debug type feature
>> > > and
>> > > when they went to production they would still be pushed to using
>> > > an
>> > > index. If we add the "create index from selector" API then I
>> > > think
>> > > this would be a relatively straightforward method to on ramping
>> > > to
>> > > both the query and index sides of the API. Ie, "You can try
>> > > queries
>> > > with testing:true, when you're ready to move to production you
>> > > can
>> > > POST your selector to _index to create the index which allows you
>> > > to
>> > > remove testing:true".
>> > >
>> > > That's also why I don't particularly care for the timeout
>> > > approach.
>> > > It's a binary threshold that a user would (maybe) meet after some
>> > > unknown amount of time after they falsely believe their app is
>> > > working
>> > > correctly. The feedback is "Everything is fine until it isn't".
>> > > Consider an app that's been working for a week or a month or more
>> > > that
>> > > suddenly starts throwing timeouts for a query. From the user's
>> > > perspective the database broke because the query that used to
>> > > work
>> > > fine no longer does. And then there's the follow on question on
>> > > how
>> > > that timeout might instruct the user that they need an index, and
>> > > that
>> > > the fix may be as easy as POSTing their selector to the _index
>> > > endpoint. Sure Google would most likely have the answer if our
>> > > docs
>> > > are good enough, but by that point the developer is probably
>> > > already
>> > > experiencing downtime if their app is live which means they're
>> > > frantically trying to fix the thing. From my point of view, a few
>> > > road
>> > > blocks that guide developers towards the correct usage early on
>> > > would
>> > > be better than letting them get to the adrenaline fueled
>> > > expletive
>> > > fountain of downtime.
>> >
>>

Reply via email to