Re: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL?

Sanne Grinovero Tue, 11 Mar 2014 12:19:19 -0700

what about we call it

int getEstimatedResultSize() ?


Having such a method occasionally return null looks very bad to me,
I'd rather remove the functionality.

-- Sanne

On 11 March 2014 19:08, Emmanuel Bernard <[email protected]> wrote:
> I agree with Randall.
>
> I tend to be very conservative about my public APIs. And offering an API that 
> I think will block me in the future is something I tend to avoid.
>
> Something like .guessNbrOfMatchingElements() / .guessResultSize() would 
> provide a better clue about the gamble the user takes. Note that the size is 
> irrespective of the pagination applied which renders this result quite cool 
> even if approximate.
>
> I’d be tempted not to put getResultSize() with an exact value in the public 
> contract as iterating is probably going to as “fast”.
>
> An alternative is something like that (needs to be refined though)
>
> /**
>  * Get the result size.
>  * Approximate results are to be preferred as it is usually very cheap to 
> compute.
>  * If the computation is too expensive, the approximate accuracy returns null.
>  *
>  * Exact results are likely to be costly and require two queries.
>  */
> Integer getResultSize(Accuracy);
> enum Accuracy { EXACT, APPROXIMATE_OR_NULL }
>
> Emmanuel
>
> On 11 Mar 2014, at 18:23, Randall Hauch <[email protected]> wrote:
>
>> I disagree. Most developers have access to the JavaDoc, and if even 
>> moderately well-written, they will find out what the method returns and 
>> when. It’s no different than a method sometimes returning null rather than 
>> an object reference.
>>
>> On Mar 11, 2014, at 12:16 PM, Dennis Reed <[email protected]> wrote:
>>
>>> Providing methods that work sometimes and don't work other times is
>>> generally a bad idea.
>>>
>>> No matter how much you document it, users *will* try to use it and
>>> expect it to always work
>>> (either because they didn't read the docs that say otherwise, they think
>>> they'll stick to a configuration where it does work, etc.)
>>>
>>> And then when it doesn't work (because they pushed something to
>>> production which has a different configuration than dev, etc)
>>> it's a frustrating experience.
>>>
>>> -Dennis
>>>
>>> On 03/11/2014 09:37 AM, Randall Hauch wrote:
>>>> I’m struggling with this same question in ModeShape. The JCR API exposes a 
>>>> method that returns the number of results, but at least the spec allows 
>>>> the implementation to return -1 if the size is not known (or very 
>>>> expensive to compute). Yet this still does not satisfy all cases.
>>>>
>>>> Depending upon the technology, computing the **exact size** ranges from 
>>>> very cheap to extremely expensive to calculate. For example, consider a 
>>>> system that has to take into account access control limitations of the 
>>>> user. My current opinion is that few applications actually need an exact 
>>>> size, and if they do there may be alternatives (like counting as they 
>>>> iterate over the results).
>>>>
>>>> An alternative is to expose an **approximate size**, which is likely to be 
>>>> sufficient for generating display or other pre-computed information such 
>>>> as links or paging details. I think that this is sufficient for most 
>>>> needs, and that even an order of magnitude is sufficient. When the results 
>>>> are known to be small, the system might want to determine the exact size 
>>>> (e.g., by iterating).
>>>>
>>>> So one option is to expose both methods, but allow the exact size method 
>>>> to return -1 if the system can’t determine the size or if doing so is very 
>>>> expensive. This allows the system a way out for large/complex queries and 
>>>> flexibility in the implementation technology. The approximate size method 
>>>> probably always needs to return at least some usable value.
>>>>
>>>> BTW, computing an exact size by iterating can be expensive unless you can 
>>>> keep all the results in memory. That’s not ideal - a query with large 
>>>> results could fill up available memory. If you don’t keep all results in 
>>>> memory, then if you’re going to allow clients to access the results more 
>>>> than once you have to provide a way to buffer the results.
>>>>
>>>>
>>>> On Mar 10, 2014, at 7:23 AM, Sanne Grinovero <[email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>> we are exposing a nice feature inherited from the Search engine via
>>>>> the "simple" DSL version, the one which is also available via Hot Rod:
>>>>>
>>>>> org.infinispan.query.dsl.Query.getResultSize()
>>>>>
>>>>> To be fair I hadn't noticed we do expose this, I just noticed after a
>>>>> recent PR review and I found it surprising.
>>>>>
>>>>> This method returns the size of the full resultset, disregarding
>>>>> pagination options; you can imagine it fit for situations like:
>>>>>
>>>>>  "found 6 million matches, these are the top 20: "
>>>>>
>>>>> A peculiarity of Hibernate Search is that the total number of matches
>>>>> is extremely cheap to figure out as it's generally a side effect of
>>>>> finding the 20 results. Essentially we're just exposing an int value
>>>>> which was already computed: very cheap, and happens to be useful in
>>>>> practice.
>>>>>
>>>>> This is not the case with a SQL statement, in this case you'd have to
>>>>> craft 2 different SQL statements, often incurring the cost of 2 round
>>>>> trips to the database. So this getResultSize() is not available on the
>>>>> Hibernate ORM Query, only on our FullTextQuery extension.
>>>>>
>>>>> Now my doubt is if it is indeed a wise move to expose this method on
>>>>> the simplified DSL. Of course some people might find it useful, still
>>>>> I'm wondering how much we'll be swearing at needing to maintain this
>>>>> feature vs its usefulness when we'll implement alternative execution
>>>>> engines to run queries, not least on Map/Reduce based filtering, and
>>>>> ultimately hybrid strategies.
>>>>>
>>>>> In case of Map/Reduce I think we'll need to keep track of possible
>>>>> de-duplication of results, in case of a Teiid integration it might
>>>>> need a second expensive query; so in this case I'd expect this method
>>>>> to be lazily evaluated.
>>>>>
>>>>> Should we rather remove this functionality?
>>>>>
>>>>> Sanne
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> [email protected]
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> [email protected]
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> [email protected]
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> [email protected]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Query.getResultSize() to be available on the simplified DSL?

Reply via email to