OK, so I suggest to switch back to tree visitor and ignore closures values (not types) in the comparison process.
On Sat, May 16, 2009 at 21:08, Giacomo Tesio <[email protected]> wrote: > A note: on cache hit the expression tree visitor should be far faster than > the one creating the cached select query (that without the param values), > becouse it should only handle parameters itself. > > > I'll probably implement a tree visitor focused to handle them. > > > What do you think about this? > > > Giacomo > > On Sat, May 16, 2009 at 8:54 PM, Giacomo Tesio <[email protected]> wrote: > >> No it probably wasn't... >> >> But with the old approach the query cache hit only with queries with >> identical parameter values. This lead in a performance problem, since memory >> grow too much (a query is cached for each values' combination) and (since >> the hash code provided in ExpressionEqualityComparer didn't include the >> parameters' values) after a while the queries compared for equality during >> to the dictionary lookup would become too much. >> >> When I added the string comparison, I actually didn't go so deep in the >> code to understand that the parameters' values where cached too (even if I >> did some test which didn't show this problem... wrong test!) >> >> Let me some hours to search a better approach. >> Probably, the best one, would be to split the SelectQuery in two parts, >> one immutable and cachable (without the parameter values) and one with >> parameters' values included. >> >> The cache hit would provide a select query which actually require a parse >> of the expression tree (if I can not find a way to cache the closures) to >> fill the query with those values. >> >> >> That said, there are surely case (even for other use we do of >> DbLinq) where CompileQuery would be a better approach, and time allowing, >> I'll try to implement it. >> >> >> >> Giacomo >> >> On Sat, May 16, 2009 at 3:58 PM, Pascal Craponne <[email protected]>wrote: >> >>> I took a look at the history, and it is now just a string comparison. >>> Was it failing before the end of april (27th)? >>> >>> On Sat, May 16, 2009 at 15:50, Pascal Craponne <[email protected]> wrote: >>> >>>> Hi, >>>> right, two expressions at different places have a different signature >>>> (hash and reference). Even the same expression called twice has a >>>> difference >>>> signature. This is why when I wrote the Expression comparer I chosed the >>>> option to compare the comparable contents. >>>> I'm pretty sure it was working at that time. >>>> Someone probably added a new comparison, which now causes comparisons to >>>> fail. >>>> >>>> Do we have simple samples of comparison returning an equality where it >>>> should not? >>>> Do we also have opposite samples? >>>> >>>> I don't like the ToString() approach, since it is not proven to be 100% >>>> reliable (I'm not sure all sub expressions generate something and something >>>> unique where converted to a string). >>>> >>>> Pascal. >>>> >>>> On Sat, May 16, 2009 at 13:55, Jonathan Pryor <[email protected]> wrote: >>>> >>>>> Behold the actual underlying problem to why QueryCache is broken: >>>>> >>>>> var id = "foo"; >>>>> var a = db.Customers.Where(c => c.CustomerID == id); >>>>> id = "bar"; >>>>> var b = db.Customers.Where(c => c.CustomerID == id); >>>>> Console.WriteLine("HashCodes: a={0}; b={1}", a.GetHashCode(), >>>>> b.GetHashCode()); >>>>> Console.WriteLine("a == b? {0}", a.Equals(b)); >>>>> >>>>> Output while running under .NET: >>>>> >>>>> HashCodes: a=38068051; b=17286255 >>>>> a == b? False >>>>> >>>>> In short, it is *not* appropriate to use an expression tree as a key >>>>> into a Dictionary, as they do not demonstrate value equality properties. >>>>> >>>>> QueryCache is broken (and effectively unfixably broken) because the >>>>> only useful key into the cache that we have...can't actually be used as a >>>>> key. >>>>> >>>>> (Though I suppose we could instead the result of ToString() as the key >>>>> into the cache, but more on that below.) >>>>> >>>>> However, just using ToString() won't fix the unit test breakage I >>>>> wrote yesterday. It will instead just make the *existing*(un-patched) >>>>> ReadTest.A5_SelectSingleOrDefault() test fail (for precisely the same >>>>> "input parameter values are *also* cached" issue I referred to >>>>> yesterday). >>>>> >>>>> Finally, I'd like to put try one last effort at pointing out why >>>>> CompiledQuery.Compile() is a better approach. Executing a Linq-to-SQL >>>>> query takes 3 steps: >>>>> >>>>> 1. Create the expression tree (done by the compiler and at runtime >>>>> via Queryable extension methods). >>>>> 2. Convert the expression tree into SQL code (done by DbLinq). >>>>> 3. Execute the SQL against the database. >>>>> >>>>> The current QueryCache approach only caches the result of (2), and if >>>>> you've ever profiled expression trees you'll find that (1) can also be >>>>> quite >>>>> slow (often unexpectedly so). (Which is why, while using ToString()would >>>>> "fix" the key lookup over using the expression tree itself, still >>>>> isn't an ideal fix: ToString() requires traversing the (possibly quite >>>>> large) expression tree, and would thus add additional overhead just to >>>>> perform the cache lookup.) >>>>> >>>>> Meanwhile, CompiledQuery.Compile() allows caching *both* steps (1) * >>>>> and* (2). There is no need for the delegate returned by >>>>> CompiledQuery.Compile() to re-create the expression tree, nor to >>>>> re-parse it, which saves time (which is why CompiledQuery exists in >>>>> the first place). >>>>> >>>>> *Please* consider implementing CompiledQuery. >>>>> >>>>> - Jon >>>>> >>>>> >>>>> On Sat, 2009-05-16 at 09:33 +0200, Giacomo Tesio wrote: >>>>> >>>>> AFAIK the closures are a sort of "pointer" to the value (actually to a >>>>> wrapper of the context that is used). >>>>> >>>>> >>>>> >>>>> I suppose that in this case, someone else (like the CLR) would update >>>>> the closures values so that the yet compiled query would find the correct >>>>> values in the environment. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I promise I'll take a look to the QueryCache again monday, but I don't >>>>> believe we could actually remove the cache, since the DbLinq's performance >>>>> would make it unusable. >>>>> >>>>> >>>>> >>>>> The problem here is understanding fully the CLR managing of closures >>>>> and understanding why at different time you get different closures. We >>>>> could >>>>> also include the closures in the expression comparisation (which actually >>>>> work only on the string rappresentation of the expression itself). >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Giacomo >>>>> >>>>> PS: as to the connected datacontext approach: I think that long lived >>>>> datacontext should not be "connected", so ours will have >>>>> ObjectTrackingEnable == false. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, May 15, 2009 at 7:34 PM, Jonathan Pryor <[email protected]> >>>>> wrote: >>>>> >>>>> I think the correct approach is to use CompiledQuery, for precisely >>>>> the cache management reasons I originally outlined. >>>>> >>>>> Though long-lived DataContexts would be problematic, as you point out, >>>>> as you'd effectively clone the entire DB into memory given enough time and >>>>> queries... I don't see how a connected database approach (which >>>>> DataContext follows) could behave otherwise. >>>>> >>>>> Does DbLinq support compiled queries? find says: >>>>> >>>>> $ find src -name CompiledQuery.cs >>>>> src/DbLinq/System.Data.Linq/CompiledQuery.cs >>>>> >>>>> Survey says...No. I believe *we should*. >>>>> >>>>> I'm not sure what you mean by our queries being static. Once we cache >>>>> a query, we don't change it anymore. (If we did, we'd be altering the >>>>> original query, which would be...bad.) >>>>> >>>>> I'm also not sure what you're getting at with closures. >>>>> >>>>> The way I envision CompiledQuery working is very similar/identical to >>>>> how your existing query caching works, except instead of storing the >>>>> SelectQuery type within QueryCache, it would instead be stored within >>>>> the delegate returned from CompiledQuery.Compile(). Thus, the user >>>>> actually deals with pre-parsed query statements, and DbLinq doesn't need >>>>> to >>>>> re-parse the LINQ statement again (as the delegate returned by >>>>> CompiledQuery.Compile() stores the pre-parsed SelectQuery instance). >>>>> >>>>> I'm not sure how much work this would take, but I'm hopeful that it >>>>> wouldn't be too much work. >>>>> >>>>> *However*, this does bring up two related issues. >>>>> >>>>> 1. I still need to figure out wtf is going on with the NerdDinner >>>>> caching bug I was seeing last week (and better, how to reproduce in the >>>>> unit >>>>> tests so that you can take a better look at it). >>>>> >>>>> 2. SelectQuery.GetCommand() gives me *really* bad feelings, because: >>>>> >>>>> 1. It takes no arguments. >>>>> 2. It calls InputParameterExpression.GetValue() with no parameters. >>>>> >>>>> 3. (A) and (B) together imply that, even though the underlying >>>>> SELECT takes named parameters (yay), there's no way to actually >>>>> provide them/alter them for the current SelectQuery instance >>>>> (wtf?). >>>>> >>>>> I *think* this is why (1) fails for me (but again, I still need to >>>>> debug). >>>>> >>>>> In any event, it makes no sense to me at all. The point to having a >>>>> SELECT with parameters is so that you can cache the expression itself >>>>> but vary the parameters. But since we're not providing any parameters, >>>>> the >>>>> parameters can't vary. So... >>>>> >>>>> It makes my head hurt, if nothing else. >>>>> >>>>> I would instead expect AbstractQuery.GetCommand() to take an 'object[] >>>>> parameters' argument (or similar) so that we can cache the actual >>>>> select statement w/o associated parameters. This would also dovetail >>>>> nicely >>>>> with the semantics CompiledQuery.Compile(), as you can provide >>>>> parameters to the expression you're compiling: >>>>> >>>>> var pepleWithLastName = CompiledQuery.Compile( >>>>> (PeopleDb db, string lastName, int start, int count) => >>>>> (from p in db.People >>>>> where p.LastName == lastName >>>>> select p) >>>>> .Skip(start) >>>>> .Take(count)); >>>>> foreach (p in peopleWithLastName(myDB, "Foo", 0, 1)) ... >>>>> >>>>> Alas, CompiledQuery.Compile() will only create delegates accepting 4 >>>>> parameters, but there are workarounds... >>>>> - Jon >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, 2009-05-15 at 16:35 +0200, Giacomo Tesio wrote: >>>>> >>>>> I'll really hope is it right... :-| >>>>> It would be a great problem otherwise. >>>>> >>>>> AFAIK, the datacontext rappresent a UnitOfWork. If the unit of work is >>>>> long as the application lifetime, than keeping it alive is right. >>>>> But in an internet application delivered via http, I've got my doubt. >>>>> >>>>> If it's not readonly, the tracked entities would become an unaligned >>>>> copy of the full database... in memory. >>>>> >>>>> >>>>> That said, does DbLinq support compiled queries? >>>>> >>>>> The problem with this approach, would be that our queries are not >>>>> static: we progressively add clausoles to IQueryable<T>. >>>>> And, how to handle with closures? >>>>> >>>>> >>>>> Giacomo >>>>> >>>>> >>>>> >>>>> On Fri, May 15, 2009 at 1:37 PM, Jonathan Pryor <[email protected]> >>>>> wrote: >>>>> >>>>> I think you'll find that you're doing it wrong. :-) >>>>> >>>>> First, I'm not sure that the assertion that all apps have short-lived >>>>> DataContexts is correct. That's certainly not the case for >>>>> NerdDinner, which has only one DataContext for the lifetime of the >>>>> app. Many apps may have short-lived DataContexts, but many won't. >>>>> >>>>> Secondly, and primarily, Microsoft doesn't do implicit query caching. >>>>> They do *explicit* query caching: >>>>> >>>>> >>>>> http://blogs.msdn.com/ricom/archive/2008/01/11/performance-quiz-13-linq-to-sql-compiled-queries-cost.aspx >>>>> >>>>> http://blogs.msdn.com/ricom/archive/2008/01/14/performance-quiz-13-linq-to-sql-compiled-query-cost-solution.aspx >>>>> >>>>> For example: >>>>> >>>>> var fq = CompiledQuery.Compile >>>>> ( >>>>> (Northwinds nw) => >>>>> (from o in nw.Orders >>>>> select new >>>>> { >>>>> OrderID = o.OrderID, >>>>> CustomerID = o.CustomerID, >>>>> EmployeeID = o.EmployeeID, >>>>> ShippedDate = o.ShippedDate >>>>> }).Take(5) >>>>> ); >>>>> >>>>> The result of CompiledQuery.Compile is a pre-compiled, pre-analyzed >>>>> query, which the *user* is responsible for caching and dealing with. >>>>> Query caches are *not* part of DataContext itself, *precisely >>>>> because*it's a recipe for a giant memory leak. >>>>> >>>>> (Consider a fictional app which uses Linq-to-SQL once at startup, or >>>>> otherwise very infrequently. The DbLinq approach would assure that the >>>>> original cached queries would never be freed; it would be a permanent >>>>> memory >>>>> tax on the app.) >>>>> >>>>> I would *strongly *suggest that you follow Microsoft's approach, drop >>>>> the DataContext query caching, and use CompiledQuery instead. >>>>> >>>>> - Jon >>>>> >>>>> >>>>> >>>>> On Fri, 2009-05-15 at 10:25 +0200, Giacomo Tesio wrote: >>>>> >>>>> OFFTOPIC about the interfaces >>>>> IQueryCache is an example of an interface which must not be removed. >>>>> This way if a skilled programmer would really need it, it could >>>>> reimplement >>>>> it to better fill it's requirement. >>>>> >>>>> Even if using ReaderWriterLockSlim, thread safety has a cost: some one >>>>> could require speed rather than thread safety. >>>>> Than it should be possible to reimplement it. >>>>> >>>>> Another use of such an interface could be to move the cache out of the >>>>> appdomain (in dedicated cache servers) and share them among servers. This >>>>> would also make the cache livecicle longher than the iis application. >>>>> I'm not a fan of this solution (I'm not sure this would lead in better >>>>> performances), but in an enteprise environment like ours, it had to be >>>>> possible to take such a decision later. >>>>> >>>>> That's why I think that good interfaces are a good thing: we should not >>>>> decide how DbLinq as to be used (tecnologically speaking of course). >>>>> >>>>> Having GOOD internal interfaces allow better flexibility in the long >>>>> run, and produce a better open source product. >>>>> >>>>> >>>>> BAD interfaces, on the other hand, reduce flexibility and improve >>>>> developments effort, but I think that they always underling a wrong >>>>> analisis >>>>> or design (if not a completely missing one). >>>>> >>>>> >>>>> I encounter often .NET programmers talking against interfaces. But till >>>>> now, I've always noticed they are talking about wrong interfaces they have >>>>> designed bottom up. It could took much time to explain a Microsoft .NET >>>>> developer that the problem are not the interfaces, the problem is their >>>>> design. >>>>> >>>>> More or less like explaing them that a Domain Model IS Object >>>>> Orientation, not a way of doing Object Orientation. >>>>> Or to explain them that a object oriented language does not lead by >>>>> itself to object oriented software... >>>>> >>>>> >>>>> Ok... I don't like Microsoft. :-D >>>>> >>>>> >>>>> >>>>> Giacomo >>>>> >>>>> >>>>> On Fri, May 15, 2009 at 10:00 AM, Giacomo Tesio <[email protected]> >>>>> wrote: >>>>> >>>>> That's a good question, i think. >>>>> >>>>> QueryCache is a cache of generated queries for each expression tree >>>>> evaluated. >>>>> It actually has to be static (at least thread static, but this would >>>>> multiply the memory usage per number of threads, also reducing the hits >>>>> and >>>>> reducing livetime to the thread one) to improve the hits, since in the >>>>> most >>>>> common DataContext use case, it rappresents a unit of work and have short >>>>> live. >>>>> If the QueryCache livecicle would match the DataContext one, probably >>>>> it would have no reason to exists. >>>>> >>>>> In "our" DbLinq use case, a readonly DataContext is used from all >>>>> threads and has a long live (the AppDomain one), while single unit of >>>>> works >>>>> actually are created per request or on a per need basis. >>>>> >>>>> The global readonly one, has to be no "instance caches" at all (no >>>>> object tracking for example, and I hope there are no other caches... but >>>>> actually I should indagate this more), but still need a QueryCache becouse >>>>> it could share its yet parsed expression tree among threads. >>>>> >>>>> Moreover all the DbLinq DataContexts would benefit from such a static >>>>> queries, greatly increasing the performances (I hope! ! ! :-D). >>>>> >>>>> Other caches (like the EntityTrakings one) must not be static since >>>>> they are conceptually linked to the DataContext that fill and use them. >>>>> >>>>> >>>>> >>>>> >>>>> Giacomo >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, May 14, 2009 at 5:19 PM, Jonathan Pryor <[email protected]> >>>>> wrote: >>>>> >>>>> This is bound to be a stupid/silly question, but why do the caches need >>>>> to be static? Static data is effectively global, i.e. a GC root in >>>>> and of itself, and thus will never be collected. Even with a good policy, >>>>> it's possible that this could use more memory than people would expect. >>>>> >>>>> See also: >>>>> >>>>> http://blogs.msdn.com/oldnewthing/archive/2006/05/02/588350.aspx<http://blogs.msdn.com/oldnewthing/archive/2006/05/02/588350.aspx> >>>>> http://blogs.msdn.com/ricom/archive/2004/01/19/60280.aspx >>>>> >>>>> Is there *really* a need for a cache that's static (i.e. shared >>>>> amongst all DataContext instances)? Or can it just be non-static and >>>>> attached to the DataContext (which would also remove all thread safety >>>>> requirements). >>>>> >>>>> Put another way, with non-shared caches if the DataContext gets >>>>> collected then the cache is also collected, thus providing a natural >>>>> mechanism to clear the cache. With shared (static) caches, they're >>>>> not connected to the DataContext, and thus it could be holding cached >>>>> data for a DataContext that no longer exists. (This may not be the >>>>> case anyway; I haven't fully read and understood the code. I'm just >>>>> trying >>>>> to make clear that preferring shared caches isn't an open and shut easy >>>>> decision.) >>>>> >>>>> Thanks, >>>>> - Jon >>>>> >>>>> >>>>> >>>>> On Thu, 2009-05-14 at 16:55 +0200, Giacomo Tesio wrote: >>>>> >>>>> I've two need: >>>>> - Thread Safety of static caches: should be done for QueryCache, but >>>>> Jon have encountered a strange (unreproduced on tests) bug while working >>>>> on >>>>> NerdDinner. If no other static cache exists they are ok. >>>>> >>>>> >>>>> - XmlMappingSource working correctly: now, associations are not loaded >>>>> from external mappings. This fix require IDataMapper and DataMapper >>>>> modifications and DataContext fixes. >>>>> >>>>> >>>>> The first is absolutelly needed (but should yet work right, just >>>>> missing true multithread test on multi core machines). >>>>> The second I think is really important, but require a bit of work. >>>>> >>>>> >>>>> >>>>> Giacomo >>>>> >>>>> >>>>> >>>>> On Thu, May 14, 2009 at 4:28 PM, Sharique <[email protected]> wrote: >>>>> >>>>> >>>>> Hi, >>>>> It has been almost 1 year since last release, I think it is time make >>>>> a new release (guess 0.19) . If there is any blocking issue, pls put >>>>> it here for discussion. So that we can resolve it quickly. >>>>> >>>>> -- >>>>> Sharique >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >> > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "DbLinq" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/dblinq?hl=en -~----------~----~----~----~------~----~------~--~---
