[dblinq] Re: new release ?

Jonathan Pryor Sat, 16 May 2009 04:56:47 -0700

Behold the actual underlying problem to why QueryCache is broken:

        var id = "foo";
        var a = db.Customers.Where(c => c.CustomerID == id);
        id = "bar";
        var b = db.Customers.Where(c => c.CustomerID == id);
        Console.WriteLine("HashCodes: a={0}; b={1}", a.GetHashCode(), 
b.GetHashCode());
        Console.WriteLine("a == b? {0}", a.Equals(b));


Output while running under .NET:

        HashCodes: a=38068051; b=17286255
        a == b? False

In short, it is not appropriate to use an expression tree as a key into
a Dictionary, as they do not demonstrate value equality properties.

QueryCache is broken (and effectively unfixably broken) because the only
useful key into the cache that we have...can't actually be used as a
key.

(Though I suppose we could instead the result of ToString() as the key
into the cache, but more on that below.)

However, just using ToString() won't fix the unit test breakage I wrote
yesterday.  It will instead just make the existing (un-patched)
ReadTest.A5_SelectSingleOrDefault() test fail (for precisely the same
"input parameter values are also cached" issue I referred to yesterday).

Finally, I'd like to put try one last effort at pointing out why
CompiledQuery.Compile() is a better approach.  Executing a Linq-to-SQL
query takes 3 steps:

     1. Create the expression tree (done by the compiler and at runtime
        via Queryable extension methods).
     2. Convert the expression tree into SQL code (done by DbLinq).
     3. Execute the SQL against the database.

The current QueryCache approach only caches the result of (2), and if
you've ever profiled expression trees you'll find that (1) can also be
quite slow (often unexpectedly so).  (Which is why, while using
ToString() would "fix" the key lookup over using the expression tree
itself, still isn't an ideal fix: ToString() requires traversing the
(possibly quite large) expression tree, and would thus add additional
overhead just to perform the cache lookup.)

Meanwhile, CompiledQuery.Compile() allows caching both steps (1) and
(2).  There is no need for the delegate returned by
CompiledQuery.Compile() to re-create the expression tree, nor to
re-parse it, which saves time (which is why CompiledQuery exists in the
first place).

Please consider implementing CompiledQuery.

 - Jon

On Sat, 2009-05-16 at 09:33 +0200, Giacomo Tesio wrote:
> AFAIK the closures are a sort of "pointer" to the value (actually to a
> wrapper of the context that is used).
>  
> I suppose that in this case, someone else (like the CLR) would update
> the closures values so that the yet compiled query would find the
> correct values in the environment.
>  
>  
> I promise I'll take a look to the QueryCache again monday, but I don't
> believe we could actually remove the cache, since the DbLinq's
> performance would make it unusable.
>  
> The problem here is understanding fully the CLR managing of closures
> and understanding why at different time you get different closures. We
> could also include the closures in the expression comparisation (which
> actually work only on the string rappresentation of the expression
> itself).
>  
>  
> Giacomo
> PS: as to the connected datacontext approach: I think that long lived
> datacontext should not be "connected", so ours will have
> ObjectTrackingEnable == false.
> 
> 
>  
> On Fri, May 15, 2009 at 7:34 PM, Jonathan Pryor <[email protected]>
> wrote:
> 
>         I think the correct approach is to use CompiledQuery, for
>         precisely the cache management reasons I originally outlined.
>         
>         Though long-lived DataContexts would be problematic, as you
>         point out, as you'd effectively clone the entire DB into
>         memory given enough time and queries...  I don't see how a
>         connected database approach (which DataContext follows) could
>         behave otherwise.
>         
>         Does DbLinq support compiled queries?  find says: 
>         
>                 $ find src -name CompiledQuery.cs
>                 src/DbLinq/System.Data.Linq/CompiledQuery.cs
>         
>         Survey says...No.  I believe we should.
>         
>         I'm not sure what you mean by our queries being static.  Once
>         we cache a query, we don't change it anymore.  (If we did,
>         we'd be altering the original query, which would be...bad.)
>         
>         I'm also not sure what you're getting at with closures.
>         
>         The way I envision CompiledQuery working is very
>         similar/identical to how your existing query caching works,
>         except instead of storing the SelectQuery type within
>         QueryCache, it would instead be stored within the delegate
>         returned from CompiledQuery.Compile().  Thus, the user
>         actually deals with pre-parsed query statements, and DbLinq
>         doesn't need to re-parse the LINQ statement again (as the
>         delegate returned by CompiledQuery.Compile() stores the
>         pre-parsed SelectQuery instance).
>         
>         I'm not sure how much work this would take, but I'm hopeful
>         that it wouldn't be too much work.
>         
>         However, this does bring up two related issues.
>         
>         1. I still need to figure out wtf is going on with the
>         NerdDinner caching bug I was seeing last week (and better, how
>         to reproduce in the unit tests so that you can take a better
>         look at it).
>         
>         2. SelectQuery.GetCommand() gives me really bad feelings,
>         because: 
>              A. It takes no arguments. 
>              B. It calls InputParameterExpression.GetValue() with no
>                 parameters. 
>              C. (A) and (B) together imply that, even though the
>                 underlying SELECT takes named parameters (yay),
>                 there's no way to actually provide them/alter them for
>                 the current SelectQuery instance (wtf?). 
>         I think this is why (1) fails for me (but again, I still need
>         to debug).
>         
>         In any event, it makes no sense to me at all.  The point to
>         having a SELECT with parameters is so that you can cache the
>         expression itself but vary the parameters.  But since we're
>         not providing any parameters, the parameters can't vary.
>         So...
>         
>         It makes my head hurt, if nothing else.
>         
>         I would instead expect AbstractQuery.GetCommand() to take an
>         'object[] parameters' argument (or similar) so that we can
>         cache the actual select statement w/o associated parameters.
>         This would also dovetail nicely with the semantics
>         CompiledQuery.Compile(), as you can provide parameters to the
>         expression you're compiling: 
>         
>                 var pepleWithLastName = CompiledQuery.Compile(
>                       (PeopleDb db, string lastName, int start, int count) =>
>                               (from p in db.People
>                                where p.LastName == lastName
>                                select p)
>                               .Skip(start)
>                               .Take(count));
>                 foreach (p in peopleWithLastName(myDB, "Foo", 0, 1)) ...
>         
>         Alas, CompiledQuery.Compile() will only create delegates
>         accepting 4 parameters, but there are workarounds...
>         - Jon 
>         
>         
>         
>         
>         On Fri, 2009-05-15 at 16:35 +0200, Giacomo Tesio wrote:
>         
>         > I'll really hope is it right... :-|
>         > It would be a great problem otherwise.
>         > 
>         > AFAIK, the datacontext rappresent a UnitOfWork. If the unit
>         > of work is long as the application lifetime, than keeping it
>         > alive is right.
>         > But in an internet application delivered via http, I've got
>         > my doubt.
>         > 
>         > If it's not readonly, the tracked entities would become an
>         > unaligned copy of the full database... in memory.
>         > 
>         > 
>         > That said, does DbLinq support compiled queries?
>         > 
>         > The problem with this approach, would be that our queries
>         > are not static: we progressively add clausoles to
>         > IQueryable<T>.
>         > And, how to handle with closures?
>         > 
>         > 
>         > Giacomo
>         > 
>         > 
>         > 
>         > On Fri, May 15, 2009 at 1:37 PM, Jonathan Pryor
>         > <[email protected]> wrote: 
>         > 
>         >         I think you'll find that you're doing it wrong. :-)
>         >         
>         >         First, I'm not sure that the assertion that all apps
>         >         have short-lived DataContexts is correct.  That's
>         >         certainly not the case for NerdDinner, which has
>         >         only one DataContext for the lifetime of the app.
>         >         Many apps may have short-lived DataContexts, but
>         >         many won't.
>         >         
>         >         Secondly, and primarily, Microsoft doesn't do
>         >         implicit query caching.  They do explicit query
>         >         caching:
>         >         
>         >                 
> http://blogs.msdn.com/ricom/archive/2008/01/11/performance-quiz-13-linq-to-sql-compiled-queries-cost.aspx
>         >                 
> http://blogs.msdn.com/ricom/archive/2008/01/14/performance-quiz-13-linq-to-sql-compiled-query-cost-solution.aspx
>         >         
>         >         For example: 
>         >         
>         >                 var fq = CompiledQuery.Compile 
>         >                 ( 
>         >                     (Northwinds nw) => 
>         >                             (from o in nw.Orders 
>         >                             select new 
>         >                                    { 
>         >                                        OrderID = o.OrderID, 
>         >                                        CustomerID = o.CustomerID, 
>         >                                        EmployeeID = o.EmployeeID, 
>         >                                        ShippedDate = o.ShippedDate 
>         >                                    }).Take(5) 
>         >                 );
>         >         
>         >         The result of CompiledQuery.Compile is a
>         >         pre-compiled, pre-analyzed query, which the user is
>         >         responsible for caching and dealing with.  Query
>         >         caches are not part of DataContext itself, precisely
>         >         because it's a recipe for a giant memory leak.
>         >         
>         >         (Consider a fictional app which uses Linq-to-SQL
>         >         once at startup, or otherwise very infrequently.
>         >         The DbLinq approach would assure that the original
>         >         cached queries would never be freed; it would be a
>         >         permanent memory tax on the app.)
>         >         
>         >         I would strongly suggest that you follow Microsoft's
>         >         approach, drop the DataContext query caching, and
>         >         use CompiledQuery instead.
>         >         
>         >         - Jon 
>         >         
>         >         
>         >         
>         >         On Fri, 2009-05-15 at 10:25 +0200, Giacomo Tesio
>         >         wrote:
>         >         
>         >         > OFFTOPIC about the interfaces
>         >         > IQueryCache is an example of an interface which
>         >         > must not be removed. This way if a skilled
>         >         > programmer would really need it, it could
>         >         > reimplement it to better fill it's requirement.
>         >         > 
>         >         > Even if using ReaderWriterLockSlim, thread safety
>         >         > has a cost: some one could require speed rather
>         >         > than thread safety.
>         >         > Than it should be possible to reimplement it.
>         >         > 
>         >         > Another use of such an interface could be to move
>         >         > the cache out of the appdomain (in dedicated cache
>         >         > servers) and share them among servers. This would
>         >         > also make the cache livecicle longher than the iis
>         >         > application.
>         >         > I'm not a fan of this solution (I'm not sure this
>         >         > would lead in better performances), but in an
>         >         > enteprise environment like ours, it had to be
>         >         > possible to take such a decision later.
>         >         > 
>         >         > That's why I think that good interfaces are a good
>         >         > thing: we should not decide how DbLinq as to be
>         >         > used (tecnologically speaking of course).
>         >         > 
>         >         > Having GOOD internal interfaces allow better
>         >         > flexibility in the long run, and produce a better
>         >         > open source product.
>         >         > 
>         >         > 
>         >         > BAD interfaces, on the other hand, reduce
>         >         > flexibility and improve developments effort, but I
>         >         > think that they always underling a wrong analisis
>         >         > or design (if not a completely missing one).
>         >         > 
>         >         > 
>         >         > I encounter often .NET programmers talking against
>         >         > interfaces. But till now, I've always noticed they
>         >         > are talking about wrong interfaces they have
>         >         > designed bottom up. It could took much time to
>         >         > explain a Microsoft .NET developer that the
>         >         > problem are not the interfaces, the problem is
>         >         > their design.
>         >         > 
>         >         > More or less like explaing them that a Domain
>         >         > Model IS Object Orientation, not a way of doing
>         >         > Object Orientation.
>         >         > Or to explain them that a object oriented language
>         >         > does not lead by itself to object oriented
>         >         > software...
>         >         > 
>         >         > 
>         >         > Ok... I don't like Microsoft. :-D
>         >         > 
>         >         > 
>         >         > 
>         >         > Giacomo
>         >         > 
>         >         > 
>         >         > On Fri, May 15, 2009 at 10:00 AM, Giacomo Tesio
>         >         > <[email protected]> wrote:
>         >         > 
>         >         >         That's a good question, i think.
>         >         >         
>         >         >         QueryCache is a cache of generated queries
>         >         >         for each expression tree evaluated.
>         >         >         It actually has to be static (at least
>         >         >         thread static, but this would multiply the
>         >         >         memory usage per number of threads, also
>         >         >         reducing the hits and reducing livetime to
>         >         >         the thread one) to improve the hits, since
>         >         >         in the most common DataContext use case,
>         >         >         it rappresents a unit of work and have
>         >         >         short live.
>         >         >         If the QueryCache livecicle would match
>         >         >         the DataContext one, probably it would
>         >         >         have no reason to exists.
>         >         >         
>         >         >         In "our" DbLinq use case, a readonly
>         >         >         DataContext is used from all threads and
>         >         >         has a long live (the AppDomain one), while
>         >         >         single unit of works actually are created
>         >         >         per request or on a per need basis.
>         >         >         
>         >         >         The global readonly one, has to be no
>         >         >         "instance caches" at all (no object
>         >         >         tracking for example, and I hope there are
>         >         >         no other caches... but actually I should
>         >         >         indagate this more), but still need a
>         >         >         QueryCache becouse it could share its yet
>         >         >         parsed expression tree among threads.
>         >         >         
>         >         >         Moreover all the DbLinq DataContexts would
>         >         >         benefit from such a static queries,
>         >         >         greatly increasing the performances (I
>         >         >         hope! ! ! :-D).
>         >         >         
>         >         >         Other caches (like the EntityTrakings one)
>         >         >         must not be static since they are
>         >         >         conceptually linked to the DataContext
>         >         >         that fill and use them.
>         >         >         
>         >         >         
>         >         >         
>         >         >         
>         >         >         Giacomo 
>         >         >         
>         >         >         
>         >         >         
>         >         >         
>         >         >         On Thu, May 14, 2009 at 5:19 PM, Jonathan
>         >         >         Pryor <[email protected]> wrote: 
>         >         >         
>         >         >                 This is bound to be a stupid/silly
>         >         >                 question, but why do the caches
>         >         >                 need to be static?  Static data is
>         >         >                 effectively global, i.e. a GC root
>         >         >                 in and of itself, and thus will
>         >         >                 never be collected.  Even with a
>         >         >                 good policy, it's possible that
>         >         >                 this could use more memory than
>         >         >                 people would expect.
>         >         >                 
>         >         >                 See also:
>         >         >                 
>         >         >                 
> http://blogs.msdn.com/oldnewthing/archive/2006/05/02/588350.aspx
>         >         >                 
>         >         >                 
> http://blogs.msdn.com/ricom/archive/2004/01/19/60280.aspx
>         >         >                 
>         >         >                 Is there really a need for a cache
>         >         >                 that's static (i.e. shared amongst
>         >         >                 all DataContext instances)?  Or
>         >         >                 can it just be non-static and
>         >         >                 attached to the DataContext (which
>         >         >                 would also remove all thread
>         >         >                 safety requirements).
>         >         >                 
>         >         >                 Put another way, with non-shared
>         >         >                 caches if the DataContext gets
>         >         >                 collected then the cache is also
>         >         >                 collected, thus providing a
>         >         >                 natural mechanism to clear the
>         >         >                 cache.  With shared (static)
>         >         >                 caches, they're not connected to
>         >         >                 the DataContext, and thus it could
>         >         >                 be holding cached data for a
>         >         >                 DataContext that no longer exists.
>         >         >                 (This may not be the case anyway;
>         >         >                 I haven't fully read and
>         >         >                 understood the code.  I'm just
>         >         >                 trying to make clear that
>         >         >                 preferring shared caches isn't an
>         >         >                 open and shut easy decision.)
>         >         >                 
>         >         >                 Thanks,
>         >         >                 - Jon 
>         >         >                 
>         >         >                 
>         >         >                 
>         >         >                 On Thu, 2009-05-14 at 16:55 +0200,
>         >         >                 Giacomo Tesio wrote: 
>         >         >                 
>         >         >                 > I've two need:
>         >         >                 > - Thread Safety of static
>         >         >                 > caches: should be done for
>         >         >                 > QueryCache, but Jon have
>         >         >                 > encountered a strange
>         >         >                 > (unreproduced on tests) bug
>         >         >                 > while working on NerdDinner. If
>         >         >                 > no other static cache exists
>         >         >                 > they are ok.
>         >         >                 > 
>         >         >                 > 
>         >         >                 > - XmlMappingSource working
>         >         >                 > correctly: now, associations are
>         >         >                 > not loaded from external
>         >         >                 > mappings. This fix require
>         >         >                 > IDataMapper and DataMapper
>         >         >                 > modifications and DataContext
>         >         >                 > fixes.
>         >         >                 > 
>         >         >                 > 
>         >         >                 > The first is absolutelly needed
>         >         >                 > (but should yet work right, just
>         >         >                 > missing true multithread test on
>         >         >                 > multi core machines).
>         >         >                 > The second I think is really
>         >         >                 > important, but require a bit of
>         >         >                 > work.
>         >         >                 > 
>         >         >                 > 
>         >         >                 > 
>         >         >                 > Giacomo
>         >         >                 > 
>         >         >                 > 
>         >         >                 > 
>         >         >                 > On Thu, May 14, 2009 at 4:28 PM,
>         >         >                 > Sharique <[email protected]>
>         >         >                 > wrote:
>         >         >                 > 
>         >         >                 >         
>         >         >                 >         Hi,
>         >         >                 >         It has been almost 1
>         >         >                 >         year since last release,
>         >         >                 >         I think it is time make
>         >         >                 >         a new release (guess
>         >         >                 >         0.19) . If there is any
>         >         >                 >         blocking issue, pls put
>         >         >                 >         it here for discussion.
>         >         >                 >         So that we can resolve
>         >         >                 >         it quickly.
>         >         >                 >         
>         >         >                 >         --
>         >         >                 >         Sharique
>         >         >                 >         
>         >         >                 > 
>         >         >                 > 
>         >         >                 > 
>         >         >                 > 
>         >         >                 
>         >         >                 
>         >         >                 
>         >         >                 
>         >         >         
>         >         >         
>         >         >         
>         >         > 
>         >         > 
>         >         > 
>         >         > 
>         >         
>         >         
>         >         
>         >         
>         > 
>         > 
>         > 
>         > 
>         
>         >         
>         

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"DbLinq" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/dblinq?hl=en
-~----------~----~----~----~------~----~------~--~---

[dblinq] Re: new release ?

Reply via email to