Re: [POC] Mango Catch All Selector

Robert Kowalski Fri, 12 Feb 2016 17:27:40 -0800

the new behaviour for mango landed this week on master, i hope you all enjoy it!


please report any bugs, problems, feedback and also praise :)

On Mon, Jan 18, 2016 at 11:59 AM, Jan Lehnardt <[email protected]> wrote:
> This is awesome: +1
>
>
>> On 18 Jan 2016, at 00:16, Robert Kowalski <[email protected]> wrote:
>>
>> Heya,
>>
>> thanks again for all the feedback! I built a prototype and added a demo 
>> video!
>>
>>> I think the current design constraint around text is a good one, and I'm
>>> unconvinced including English text is a good direction.
>>>
>>> If you want to take this direction, including a URL to our documentation
>>> instead (which *is* internationalized) is probably a better way to go,
>>> something like:
>>> .... {"_warning": "http://docs.couchdb.org/en/2.0.0/.....”}]
>>
>> I really like this idea! I thought long about it and I think it grows
>> the scope of the current task. Right now all strings CouchDB returns
>> to the user are written in English. The current message that no index
>> exists is also in english. Sadly our documentation is not
>> internationalised yet - afaik no language has a complete translation
>> and the translations are not available as a website or in any other
>> public form. I stopped translating to German myself as the promised
>> integration into the doc build was never finished in ~1.5 years. For
>> the specific task right now I would like to keep the scope as small as
>> possible. This does not mean that I would stand in the way if folks
>> want to add i18n to the project and its sub-projects and have the
>> tooling and time to maintain it.
>>
>>
>> Because a prototype speaks more than 1000 posts I hacked a prototype
>> which includes the warning that was proposed by Garren. You can check
>> it out at https://github.com/apache/couchdb-mango/pull/27 - or watch
>> the video: https://cloudup.com/cEnbWqbX5Y7
>>
>> What do you think?
>>
>> On Wed, Jan 13, 2016 at 11:58 PM, Jan Lehnardt <[email protected]> wrote:
>>>
>>>> On 13 Jan 2016, at 23:41, Joan Touzet <[email protected]> wrote:
>>>>
>>>> Warning: If we start using English text in a response such as this, we'll
>>>> need to start externalising strings and internationalising them. We've 
>>>> never
>>>> had to do this before because our API is, in general, terse and relies on
>>>> HTTP status codes to indicate when something has gone wrong.
>>>>
>>>> I think the current design constraint around text is a good one, and I'm
>>>> unconvinced including English text is a good direction.
>>>>
>>>> If you want to take this direction, including a URL to our documentation
>>>> instead (which *is* internationalized) is probably a better way to go,
>>>> something like:
>>>>
>>>> .... {"_warning": "http://docs.couchdb.org/en/2.0.0/.....”}]
>>>
>>> bikeshed: maybe slow_warning (like we use not_found on 404s), but yeah,
>>> something like this!
>>>
>>> Great discussion everyone. I like how we are all making this idea better 
>>> together :)
>>>
>>> Best
>>> Jan
>>> --
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>> From: "Robert Kowalski" <[email protected]>
>>>> To: [email protected]
>>>> Sent: Wednesday, January 13, 2016 2:47:27 PM
>>>> Subject: Re: [POC] Mango Catch All Selector
>>>>
>>>> Hi Garren,
>>>>
>>>> what would selector: null do? Return all docs?
>>>>
>>>> Where in the answer from CouchDB would be the warning? Next to the
>>>> resultset, like
>>>>
>>>> [{"_id": "foo", "_rev": "535"}, {"_warning": "slow query, use an index for
>>>> better performance"}] ?
>>>>
>>>> Am Mittwoch, 13. Januar 2016 schrieb Garren Smith :
>>>>
>>>>> Hi Robert,
>>>>>
>>>>> I think you miss understood me, I don’t want it to be a different 
>>>>> endpoint.
>>>>> I just don’t want a user to have to do queries like this find({slow:
>>>>> true}). I want them to be able to do a query e.g. find({}) or
>>>>> find({selector: null}) and then get back the results along with a warning
>>>>> message telling them that this query would be slow in production.
>>>>> The lower the barrier for entry here the better. I know we want to protect
>>>>> our users for when they go to production, but forcing them to add a slow:
>>>>> true flag won’t help. It will still require them to read the docs a lot
>>>>> more than most people are willing to on a first attempt of something new.
>>>>>
>>>>> Cheers
>>>>> Garren
>>>>>> On 12 Jan 2016, at 9:16 PM, Robert Kowalski <[email protected]
>>>>> <javascript:;>> wrote:
>>>>>>
>>>>>> thank you all for your feedback!
>>>>>>
>>>>>> i like the idea of the error message with a new url.
>>>>>>
>>>>>> i agree with garren that it should be a separate endpoint. it takes
>>>>>> some complexity off when explaining each endpoint.
>>>>>>
>>>>>> maybe: `/_find_slow`?
>>>>>>
>>>>>> On Tue, Jan 12, 2016 at 10:36 AM, Jan Lehnardt <[email protected]
>>>>> <javascript:;>> wrote:
>>>>>>>
>>>>>>>> On 11 Jan 2016, at 19:55, Tony Sun <[email protected]
>>>>> <javascript:;>> wrote:
>>>>>>>>
>>>>>>>> Hi Robert,
>>>>>>>>
>>>>>>>> Building upon what others have stated above, what do you think about
>>>>>>>> the following:
>>>>>>>>
>>>>>>>> 1) Let the user query without creating an index
>>>>>>>> 2) Return an error message with a new url that has
>>>>>>>> "slow/no_index/developer":true appended at the end. The message clearly
>>>>>>>> explains that this query will be slow, and that creating an index will
>>>>> be
>>>>>>>> more efficient. However, he or she can continue. The error message will
>>>>>>>> then have a link to point to our documentation.
>>>>>>>> 3) In Fauxton, there is a checkbox or button that also appends the
>>>>>>>> "slow/no_index/developer":true to the _find url. If the user clicks it,
>>>>>>>> then the same message pops up to notify the user.
>>>>>>>
>>>>>>>
>>>>>>> I like this!
>>>>>>>
>>>>>>>
>>>>>>> Jan
>>>>>>> --
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Tony
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 11, 2016 at 9:45 AM, Eli Stevens (Gmail) <
>>>>> [email protected] <javascript:;>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Just wanted to chime in here as a user - I've run into similar
>>>>>>>>> behavior from CouchDB with the reduce-not-reducing-enough heuristic,
>>>>>>>>> where stuff I was working on went smoothly in dev, but stopped once
>>>>>>>>> real load was pushed through it (thankfully for me, that was in
>>>>>>>>> testing, rather than released to customers).
>>>>>>>>>
>>>>>>>>> It's a frustrating experience, and I don't think that a reputation for
>>>>>>>>> "works until you cross a threshold, and then it doesn't, but only in
>>>>>>>>> production" is a good thing to move towards.
>>>>>>>>>
>>>>>>>>> Perhaps something like adding a key to the returned data along the
>>>>>>>>> lines of "_slow_warning": "This query is going to be slow on large
>>>>>>>>> data sets. See http://..."; in addition to the ?slow_warning=true
>>>>> query
>>>>>>>>> param (note that I'm calling it "slow_warning" in both places only to
>>>>>>>>> increase discoverability; without the url param, the no-index query
>>>>>>>>> wouldn't work at all). Bikeshed the name as needed.
>>>>>>>>>
>>>>>>>>> I'd like to see a lot more URLs in CouchDB error messages in general,
>>>>>>>>> actually - I would find it very useful when trying to determine what's
>>>>>>>>> going wrong to have a URL right there in the logs that I can get more
>>>>>>>>> information from.
>>>>>>>>>
>>>>>>>>> On Sun, Jan 10, 2016 at 11:54 AM, Joan Touzet <[email protected]
>>>>> <javascript:;>> wrote:
>>>>>>>>>> Hi Robert,
>>>>>>>>>>
>>>>>>>>>> I've been thinking about this one for the week or so, and I have a
>>>>>>>>>> simple suggestion:
>>>>>>>>>>
>>>>>>>>>> Add the query parameter slow=true to enable this behaviour.
>>>>>>>>>>
>>>>>>>>>> This meets all the original requirements:
>>>>>>>>>>
>>>>>>>>>> 1. It is not default behaviour
>>>>>>>>>> 2. You can grep the log files for the word 'slow' and find evidence
>>>>>>>>>> 3. There is a shorthand, simple way to enable the behaviour
>>>>>>>>>> 4. Any self-respecting developer will try to remove slow=true, find
>>>>>>>>>> a break, and be forced to learn about indexes
>>>>>>>>>> 5. It's a bit cheeky, which I think is kind of fun :D
>>>>>>>>>>
>>>>>>>>>> All the best,
>>>>>>>>>> Joan
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> From: "William Edney" <[email protected] <javascript:;>>
>>>>>>>>>>> To: [email protected] <javascript:;>
>>>>>>>>>>> Sent: Friday, January 8, 2016 10:27:29 AM
>>>>>>>>>>> Subject: Re: [POC] Mango Catch All Selector
>>>>>>>>>>>
>>>>>>>>>>> Hi Robert -
>>>>>>>>>>>
>>>>>>>>>>> As a builder of UI, API and library code who has also done developer
>>>>>>>>>>> training on a variety of technologies, one simple fix might be go
>>>>>>>>>>> ahead and
>>>>>>>>>>> not require indexes to be built, but then to put a big NOTE at the
>>>>>>>>>>> beginning of the "Mango Getting Started" guide (I would assume there
>>>>>>>>>>> is
>>>>>>>>>>> such a piece of documentation) that states: "Note that the examples
>>>>>>>>>>> in this
>>>>>>>>>>> document do not require you to build an index, but for performance
>>>>>>>>>>> reasons
>>>>>>>>>>> we HIGHLY RECOMMEND that you do so. *Click here* for more
>>>>> information
>>>>>>>>>>> about
>>>>>>>>>>> how to do that" (or some such verbiage).
>>>>>>>>>>>
>>>>>>>>>>> My 2 cents.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> - Bill
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jan 8, 2016 at 9:04 AM, Robert Kowalski <[email protected]
>>>>> <javascript:;>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi list,
>>>>>>>>>>>>
>>>>>>>>>>>> At the end of the mail I would like to invite the other folks from
>>>>>>>>>>>> the
>>>>>>>>>>>> mailing list that build interfaces for humans (APIs, CLIs or even
>>>>>>>>>>>> UIs)
>>>>>>>>>>>> to chime in again with their opinions. So all people one the ML,
>>>>>>>>>>>> the
>>>>>>>>>>>> mail is not just a response to Paul, feedback is welcome :)
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Paul, I agree with the timeout. It could lead to very unpleasant
>>>>>>>>>>>> errors which are hard to debug and support.
>>>>>>>>>>>>
>>>>>>>>>>>> I added some thoughts to the other points you made:
>>>>>>>>>>>>
>>>>>>>>>>>>> a) know that the slow queries logs exist,
>>>>>>>>>>>>
>>>>>>>>>>>> Hmm... If I take a look at the 1.x logging it was very
>>>>>>>>>>>> straightforward. As a developer you would spin up a CouchDB and you
>>>>>>>>>>>> get all the log messages into your terminal. It was quite handy in
>>>>>>>>>>>> general for all kind of debugging. That the logs are not displayed
>>>>>>>>>>>> directly on stdout/stderr is in my opinion a general 2.x problem.
>>>>>>>>>>>> The
>>>>>>>>>>>> problem does occur with all kinds of log message we produce in
>>>>>>>>>>>> CouchDB
>>>>>>>>>>>> for 2.x and is not specific to the slow-query-logging.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Ie, "You can try queries with testing:true, when you're ready to
>>>>>>>>>>>>> move to
>>>>>>>>>>>> production you can
>>>>>>>>>>>>> POST your selector to _index to create the index which allows you
>>>>>>>>>>>>> to
>>>>>>>>>>>>> remove testing:true".
>>>>>>>>>>>>
>>>>>>>>>>>> I really like the migration path you mentioned here with the API to
>>>>>>>>>>>> create indexes. I am worried to have a too high entry barrier for
>>>>>>>>>>>> absolute newcomers, people that you want to play around before they
>>>>>>>>>>>> are ready to think about indexes, e.g. by putting coupling the
>>>>>>>>>>>> index
>>>>>>>>>>>> topic from the beginning to the querying.
>>>>>>>>>>>>
>>>>>>>>>>>> When I throw too much things to learn on people (which  may not
>>>>>>>>>>>> have
>>>>>>>>>>>> used a database before), most people get discouraged and does not
>>>>>>>>>>>> take
>>>>>>>>>>>> a look. The usual things they feel or say are : "too complicated",
>>>>>>>>>>>> "I
>>>>>>>>>>>> have not enough time", "product XY is easier to use".
>>>>>>>>>>>>
>>>>>>>>>>>> I would argue that newcomers to a database will launch a high
>>>>>>>>>>>> traffic,
>>>>>>>>>>>> multi-gigabyte product with the database from day one. Day one is
>>>>>>>>>>>> the
>>>>>>>>>>>> day where they learn how to query the data and put data into the
>>>>>>>>>>>> database. Even for scenarios where people have a running high
>>>>>>>>>>>> traffic
>>>>>>>>>>>> system, and have used other databases at a medium to large scale I
>>>>>>>>>>>> would expect given they migrate to Couch, that they run both
>>>>>>>>>>>> systems
>>>>>>>>>>>> in parallel for the first time in order to fix the issues that
>>>>>>>>>>>> occur
>>>>>>>>>>>> during a migration.
>>>>>>>>>>>>
>>>>>>>>>>>> I think we we share the same goal (getting beginners started
>>>>>>>>>>>> quickly)
>>>>>>>>>>>> and the cool thing about your suggestion is that everyone gets the
>>>>>>>>>>>> required knowledge to run a production system right from the very
>>>>>>>>>>>> start. My suggestion leaves some parts out, but reduces the
>>>>>>>>>>>> cognitive
>>>>>>>>>>>> load required to get the very first basic results, e.g. in a
>>>>>>>>>>>> university class setting - or junior developers on their "casual
>>>>>>>>>>>> friday 20% time". My big hope is, once those folks build high
>>>>>>>>>>>> traffic
>>>>>>>>>>>> systems, they remember how easy the usage of CouchDB was and that
>>>>>>>>>>>> they
>>>>>>>>>>>> start to learn more about CouchDB in order to run it in a system
>>>>>>>>>>>> with
>>>>>>>>>>>> more than a few thousand documents.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> For us both I think the "what" is clear, but the "how" is a bit
>>>>>>>>>>>> different. I also think this discussion still makes progress, but I
>>>>>>>>>>>> am
>>>>>>>>>>>> afraid it could stall. I see that we both have very good rudiments
>>>>>>>>>>>> and
>>>>>>>>>>>> I would like to invite the other folks from the mailing list that
>>>>>>>>>>>> build interfaces for humans (APIs, CLIs or even UIs) to chime in
>>>>>>>>>>>> again
>>>>>>>>>>>> with their opinions - of course I'm also looking forward to your
>>>>>>>>>>>> answer :)
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Robert :)
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 6, 2016 at 6:21 PM, Paul Davis
>>>>>>>>>>>> <[email protected] <javascript:;>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> - is a timeout solving the root cause or the symptoms? Could it
>>>>>>>>>>>>>>> be a
>>>>>>>>>>>>>>> temporary or additional step as in conjunction with query
>>>>>>>>>>>>>>> optimisation
>>>>>>>>>>>>>>> tooling?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It really depends. From my CouchDB admin and user perspective,
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> doesn't seem so important to me right now. However, I recognize
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> there are different usage scenarios with different requirents
>>>>>>>>>>>>>> (e.g. the
>>>>>>>>>>>>>> ones at Cloudant).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't think there's anything special about Cloudant in this
>>>>>>>>>>>>> discussion. Its just a question of how do we allow new users the
>>>>>>>>>>>>> ability to easily test and learn the selector/query API while
>>>>>>>>>>>>> also
>>>>>>>>>>>>> preventing them from going too far without creating indexes for
>>>>>>>>>>>>> their
>>>>>>>>>>>>> queries. The slow queries messages are fine, but just as any
>>>>>>>>>>>>> other
>>>>>>>>>>>>> database they don't really prompt the developer to make the
>>>>>>>>>>>>> correct
>>>>>>>>>>>>> change. Ie, the developer has to be savvy enough to a) know that
>>>>>>>>>>>>> the
>>>>>>>>>>>>> slow queries logs exist, b) understand that creating an index
>>>>>>>>>>>>> would
>>>>>>>>>>>>> speed things up, and then c) know which index to create based on
>>>>>>>>>>>>> the
>>>>>>>>>>>>> logged query.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In my experience, the group of users that we're concerned about
>>>>>>>>>>>>> in
>>>>>>>>>>>>> this discussion most likely don't know about any of those three
>>>>>>>>>>>>> things, hence why the current API is designed to force them to
>>>>>>>>>>>>> learn
>>>>>>>>>>>>> about and understand indexes as part of learning the API. Granted
>>>>>>>>>>>>> the
>>>>>>>>>>>>> `_id > null` trick muddies that learning process. I would think
>>>>>>>>>>>>> that
>>>>>>>>>>>>> replacing the _id trick with `"testing": true` or similar would
>>>>>>>>>>>>> be an
>>>>>>>>>>>>> obvious indication to users that this is a dev/debug type feature
>>>>>>>>>>>>> and
>>>>>>>>>>>>> when they went to production they would still be pushed to using
>>>>>>>>>>>>> an
>>>>>>>>>>>>> index. If we add the "create index from selector" API then I
>>>>>>>>>>>>> think
>>>>>>>>>>>>> this would be a relatively straightforward method to on ramping
>>>>>>>>>>>>> to
>>>>>>>>>>>>> both the query and index sides of the API. Ie, "You can try
>>>>>>>>>>>>> queries
>>>>>>>>>>>>> with testing:true, when you're ready to move to production you
>>>>>>>>>>>>> can
>>>>>>>>>>>>> POST your selector to _index to create the index which allows you
>>>>>>>>>>>>> to
>>>>>>>>>>>>> remove testing:true".
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's also why I don't particularly care for the timeout
>>>>>>>>>>>>> approach.
>>>>>>>>>>>>> It's a binary threshold that a user would (maybe) meet after some
>>>>>>>>>>>>> unknown amount of time after they falsely believe their app is
>>>>>>>>>>>>> working
>>>>>>>>>>>>> correctly. The feedback is "Everything is fine until it isn't".
>>>>>>>>>>>>> Consider an app that's been working for a week or a month or more
>>>>>>>>>>>>> that
>>>>>>>>>>>>> suddenly starts throwing timeouts for a query. From the user's
>>>>>>>>>>>>> perspective the database broke because the query that used to
>>>>>>>>>>>>> work
>>>>>>>>>>>>> fine no longer does. And then there's the follow on question on
>>>>>>>>>>>>> how
>>>>>>>>>>>>> that timeout might instruct the user that they need an index, and
>>>>>>>>>>>>> that
>>>>>>>>>>>>> the fix may be as easy as POSTing their selector to the _index
>>>>>>>>>>>>> endpoint. Sure Google would most likely have the answer if our
>>>>>>>>>>>>> docs
>>>>>>>>>>>>> are good enough, but by that point the developer is probably
>>>>>>>>>>>>> already
>>>>>>>>>>>>> experiencing downtime if their app is live which means they're
>>>>>>>>>>>>> frantically trying to fix the thing. From my point of view, a few
>>>>>>>>>>>>> road
>>>>>>>>>>>>> blocks that guide developers towards the correct usage early on
>>>>>>>>>>>>> would
>>>>>>>>>>>>> be better than letting them get to the adrenaline fueled
>>>>>>>>>>>>> expletive
>>>>>>>>>>>>> fountain of downtime.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>

Re: [POC] Mango Catch All Selector

Reply via email to