Re: RE: RE: [nhibernate-development] NH-2583 - Query with || operator and navigations (many-to-one) creates wrong joins

Harald Mueller Wed, 13 Apr 2011 04:37:17 -0700

Hi Stefan -

> Hi Harald,
> 
> so you basically say that LINQ is just for convenience, for the many easy
> queries. You just write and forget them, and they will just work.


Oops. I seem to have "misworded" my answer quite a lot. I wanted to say what I 
consider almost the *opposite* of what you understood:

* Linq has to work for all queries, even the most complicated ones.
* You write and "forget" them in the sense that you "write and forget" C# - I 
don't care what happens in the compiler, operating system, processor.

What I only also tried to say is:

* If you *want* WHERE A = B in your query, then this is neither simple or 
complicated - it is not a use case for Linq at all.

> As soon as stuff gets complicated (so you need to hack around with ad hoc
> SQL queries) and/or performance critical (so you need to read the SQL and
> modify the query to generate another SQL) you'd just re-write the query in
> SQL or HQL and not bother with LINQ at all. 
> 
> Did I get this right?

No ... but it was my formulation ...

> 
> It's a valid direction to take. Personally, I like to be able to use one
> language that takes me all the way. 

... Yes!

> So I want to be able to write as many queries
> in LINQ as possible, 

.. still yes ...

> and the best way to get there is to think about these
> queries in DB terms, not in terms of in-memory objects.

... aha: here, I say loud and distinctly: No.

Here is "simple complex example" in "texty UML": 

Order <>---* OrderLine --->1 Product --->1 ProductGroup

1. Find all orders where some order line references a product of (a fixed) 
group A.

Linq: 
    From<Order>()
        .Where(o => o.OrderLines
                     .Any(li => li.Product.ProductGroup == fixedGroupA));

Predicate calculus:
    ...like Linq, only with funny symbols (like the inverted E)...

SQL: SELECT Order.* FROM
        Order 
        INNER JOIN OrderLine ON ...
        INNER JOIN Product ON ...
     WHERE
        Product.ProductGroupId = fixedGroupAId

2. Find all orders where *not* some order line references a product of (a 
fixed) group A. Note: Minimal addition of the word "not".

Linq - also minimal addition of "!":
    From<Order>()
        .Where(o => !o.OrderLines
                     .Any(li => li.Product.ProductGroup == fixedGroupA));

Predicate calculus:
    ...introducton of a simple not...

SQL: ... a famous brain teaser for SQL people ... Looks totally different than 
the query above.

BUT I should now also give the opposite example: E.g. something with an OUTER 
JOIN and COALESCE - where the SQL formulation and changes are "small" and 
"obvious", whereas in Linq you'd have to re-think the query.

I don't give that example - you can insert you favorite one here.

What I want to say is that, even for not-too-complex queries, relational 
calculus/SQL and also predicate calculus/Linq can get quite hard to write.

Still, in my experience, predicate calculus (if packaged suitable so that 
people don't "see" that it's "theory" and therefore shun from it; C#'s List<> 
methods and now Linq are good examples of such packagings) is the "better" way: 
It maps quite often to standard specifications where people say/write things 
like the examples 1. and 2. I gave above.

For this reason, I teach all programmers I can grab that SQL is *not* the way 
to *think* about queries, even if it is the way to *implement* them (there are 
more examples where SQL gets unwieldy on simple modifications - e.g. finding an 
associated object in a :n association for which some value is the maximum - 
requires a "back join" with risky key comparisons [unless you use MySQLs 
uncharted extension ...]).

That does not at all mean that SQL is "bad", in any sense. If you do your job 
in SQL (relational calculus) "all the time", then the calculus is a very valid 
machinery.

But *if* you start working with Linq (or predicate calculus; or other QLs based 
on quantifications), then I find (and have found) the idea to

    "think about these queries in DB terms"

(I assume you mean SQL / relational calculus here) - let me use "emotional 
words" here - unproductive, quarrelsome, disturbing, error-prone ... because 
you mix two concepts here.

In a sense, what I wnat to say is the same as you:

> Personally, I like to be able to use one language that takes me all the way. 

But "all the way" for me also means "all the way in thinking and implementing 
and reasoning" (and maybe also testing in-memory; but that's certainly not at 
the center of my argument). Therefore, if I or we had to 

    "think about these queries in DB terms"

for some Linq provider because its semantics deviates from Linq2Objects 
*markedly* (not only in strange boundary cases, that provider would *not* (yet) 
"take us all the way." in my opinion.

Ok ... so this was another attempt to explain my standpoint: Hopefully clearer 
than the last time, even though - again - too long ...

> 
> To top your invalid comparison with another invalid one: What you're
> suggesting is like using automatic object brokers in your distributed app, and
> for every spot where the by-method remoting gets too slow, you just open up
> a HTTP connection and do it manually. ;-)

Actually, I find that as horrible as you do.

> So here's my opinion. I'm not a NH user, so other people will have to
> decide which way NH goes (maybe both).
> 
> > (b) Do you really need that SQL = *semantics*? I.e., do you *want* to
> > *not* get the objects where both A and B are null? In all my career, I
> > have not seen application code that *relied* on such a result 
> 
> Like I said: when you join tables, you wouldn't have it any other way. And
> you can use multiple FROMs and a WHERE statement for joining too.

Ah - but (a) you don't write joins in Linq with == (you use navigation)  
[except ... another debate]; so this is no argument; (b) for joins, the primary 
key is always not null, so the extended == translation is definitely not 
necessary - see (f) or (g).

> (right now there are more serious flaws in the NH
> provider that should get attention first). 

First, "wholehearted yes". Then: Do we agree on them? Take a look at NH-2648 
and NH-2649 which I just posted to JIRA: Are they as critical for you as for me?

> And maybe at the end of the day it
> turns out that the differences between both ways are not worth the confusion. 

Might be so!

> 
> But since both ways come with tradeoffs, maybe it's best to let the user
> decide.

We'll work on this when we have time, won't we? ;-)

Regards
Harald

P.S. 
Re Stefan's initial remark "You just write and forget them [the Linq queries], 
and they will just work.": One can think about this even for Linq2Objects: Do 
you care for the performance of a Linq expression or not? In some sense, I 
don't care: "write and forget" - how a .Join or .Where or .Skip runs behind the 
scenes is an intenral problem. But is this really so? I guess not when your 
lists are long (millions of objects) or your algorithms deeply nested (n^3, n^4 
etc.) 


-- 
GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit 
gratis Handy-Flat! http://portal.gmx.net/de/go/dsl

Re: RE: RE: [nhibernate-development] NH-2583 - Query with || operator and navigations (many-to-one) creates wrong joins

Reply via email to