Providing methods that work sometimes and don't work other times is 
generally a bad idea.

No matter how much you document it, users *will* try to use it and 
expect it to always work
(either because they didn't read the docs that say otherwise, they think 
they'll stick to a configuration where it does work, etc.)

And then when it doesn't work (because they pushed something to 
production which has a different configuration than dev, etc)
it's a frustrating experience.

-Dennis

On 03/11/2014 09:37 AM, Randall Hauch wrote:
> I’m struggling with this same question in ModeShape. The JCR API exposes a 
> method that returns the number of results, but at least the spec allows the 
> implementation to return -1 if the size is not known (or very expensive to 
> compute). Yet this still does not satisfy all cases.
>
> Depending upon the technology, computing the **exact size** ranges from very 
> cheap to extremely expensive to calculate. For example, consider a system 
> that has to take into account access control limitations of the user. My 
> current opinion is that few applications actually need an exact size, and if 
> they do there may be alternatives (like counting as they iterate over the 
> results).
>
> An alternative is to expose an **approximate size**, which is likely to be 
> sufficient for generating display or other pre-computed information such as 
> links or paging details. I think that this is sufficient for most needs, and 
> that even an order of magnitude is sufficient. When the results are known to 
> be small, the system might want to determine the exact size (e.g., by 
> iterating).
>
> So one option is to expose both methods, but allow the exact size method to 
> return -1 if the system can’t determine the size or if doing so is very 
> expensive. This allows the system a way out for large/complex queries and 
> flexibility in the implementation technology. The approximate size method 
> probably always needs to return at least some usable value.
>
> BTW, computing an exact size by iterating can be expensive unless you can 
> keep all the results in memory. That’s not ideal - a query with large results 
> could fill up available memory. If you don’t keep all results in memory, then 
> if you’re going to allow clients to access the results more than once you 
> have to provide a way to buffer the results.
>
>
> On Mar 10, 2014, at 7:23 AM, Sanne Grinovero <[email protected]> wrote:
>
>> Hi all,
>> we are exposing a nice feature inherited from the Search engine via
>> the "simple" DSL version, the one which is also available via Hot Rod:
>>
>> org.infinispan.query.dsl.Query.getResultSize()
>>
>> To be fair I hadn't noticed we do expose this, I just noticed after a
>> recent PR review and I found it surprising.
>>
>> This method returns the size of the full resultset, disregarding
>> pagination options; you can imagine it fit for situations like:
>>
>>    "found 6 million matches, these are the top 20: "
>>
>> A peculiarity of Hibernate Search is that the total number of matches
>> is extremely cheap to figure out as it's generally a side effect of
>> finding the 20 results. Essentially we're just exposing an int value
>> which was already computed: very cheap, and happens to be useful in
>> practice.
>>
>> This is not the case with a SQL statement, in this case you'd have to
>> craft 2 different SQL statements, often incurring the cost of 2 round
>> trips to the database. So this getResultSize() is not available on the
>> Hibernate ORM Query, only on our FullTextQuery extension.
>>
>> Now my doubt is if it is indeed a wise move to expose this method on
>> the simplified DSL. Of course some people might find it useful, still
>> I'm wondering how much we'll be swearing at needing to maintain this
>> feature vs its usefulness when we'll implement alternative execution
>> engines to run queries, not least on Map/Reduce based filtering, and
>> ultimately hybrid strategies.
>>
>> In case of Map/Reduce I think we'll need to keep track of possible
>> de-duplication of results, in case of a Teiid integration it might
>> need a second expensive query; so in this case I'd expect this method
>> to be lazily evaluated.
>>
>> Should we rather remove this functionality?
>>
>> Sanne
>> _______________________________________________
>> infinispan-dev mailing list
>> [email protected]
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to