A note: on cache hit the expression tree visitor should be far faster than the one creating the cached select query (that without the param values), becouse it should only handle parameters itself.
I'll probably implement a tree visitor focused to handle them. What do you think about this? Giacomo On Sat, May 16, 2009 at 8:54 PM, Giacomo Tesio <[email protected]> wrote: > No it probably wasn't... > > But with the old approach the query cache hit only with queries with > identical parameter values. This lead in a performance problem, since memory > grow too much (a query is cached for each values' combination) and (since > the hash code provided in ExpressionEqualityComparer didn't include the > parameters' values) after a while the queries compared for equality during > to the dictionary lookup would become too much. > > When I added the string comparison, I actually didn't go so deep in the > code to understand that the parameters' values where cached too (even if I > did some test which didn't show this problem... wrong test!) > > Let me some hours to search a better approach. > Probably, the best one, would be to split the SelectQuery in two parts, one > immutable and cachable (without the parameter values) and one with > parameters' values included. > > The cache hit would provide a select query which actually require a parse > of the expression tree (if I can not find a way to cache the closures) to > fill the query with those values. > > > That said, there are surely case (even for other use we do of DbLinq) where > CompileQuery would be a better approach, and time allowing, I'll try to > implement it. > > > > Giacomo > > On Sat, May 16, 2009 at 3:58 PM, Pascal Craponne <[email protected]>wrote: > >> I took a look at the history, and it is now just a string comparison. >> Was it failing before the end of april (27th)? >> >> On Sat, May 16, 2009 at 15:50, Pascal Craponne <[email protected]> wrote: >> >>> Hi, >>> right, two expressions at different places have a different signature >>> (hash and reference). Even the same expression called twice has a difference >>> signature. This is why when I wrote the Expression comparer I chosed the >>> option to compare the comparable contents. >>> I'm pretty sure it was working at that time. >>> Someone probably added a new comparison, which now causes comparisons to >>> fail. >>> >>> Do we have simple samples of comparison returning an equality where it >>> should not? >>> Do we also have opposite samples? >>> >>> I don't like the ToString() approach, since it is not proven to be 100% >>> reliable (I'm not sure all sub expressions generate something and something >>> unique where converted to a string). >>> >>> Pascal. >>> >>> On Sat, May 16, 2009 at 13:55, Jonathan Pryor <[email protected]> wrote: >>> >>>> Behold the actual underlying problem to why QueryCache is broken: >>>> >>>> var id = "foo"; >>>> var a = db.Customers.Where(c => c.CustomerID == id); >>>> id = "bar"; >>>> var b = db.Customers.Where(c => c.CustomerID == id); >>>> Console.WriteLine("HashCodes: a={0}; b={1}", a.GetHashCode(), >>>> b.GetHashCode()); >>>> Console.WriteLine("a == b? {0}", a.Equals(b)); >>>> >>>> Output while running under .NET: >>>> >>>> HashCodes: a=38068051; b=17286255 >>>> a == b? False >>>> >>>> In short, it is *not* appropriate to use an expression tree as a key >>>> into a Dictionary, as they do not demonstrate value equality properties. >>>> >>>> QueryCache is broken (and effectively unfixably broken) because the >>>> only useful key into the cache that we have...can't actually be used as a >>>> key. >>>> >>>> (Though I suppose we could instead the result of ToString() as the key >>>> into the cache, but more on that below.) >>>> >>>> However, just using ToString() won't fix the unit test breakage I wrote >>>> yesterday. It will instead just make the *existing* (un-patched) >>>> ReadTest.A5_SelectSingleOrDefault() test fail (for precisely the same >>>> "input parameter values are *also* cached" issue I referred to >>>> yesterday). >>>> >>>> Finally, I'd like to put try one last effort at pointing out why >>>> CompiledQuery.Compile() is a better approach. Executing a Linq-to-SQL >>>> query takes 3 steps: >>>> >>>> 1. Create the expression tree (done by the compiler and at runtime >>>> via Queryable extension methods). >>>> 2. Convert the expression tree into SQL code (done by DbLinq). >>>> 3. Execute the SQL against the database. >>>> >>>> The current QueryCache approach only caches the result of (2), and if >>>> you've ever profiled expression trees you'll find that (1) can also be >>>> quite >>>> slow (often unexpectedly so). (Which is why, while using ToString()would >>>> "fix" the key lookup over using the expression tree itself, still >>>> isn't an ideal fix: ToString() requires traversing the (possibly quite >>>> large) expression tree, and would thus add additional overhead just to >>>> perform the cache lookup.) >>>> >>>> Meanwhile, CompiledQuery.Compile() allows caching *both* steps (1) *and >>>> * (2). There is no need for the delegate returned by >>>> CompiledQuery.Compile() to re-create the expression tree, nor to >>>> re-parse it, which saves time (which is why CompiledQuery exists in the >>>> first place). >>>> >>>> *Please* consider implementing CompiledQuery. >>>> >>>> - Jon >>>> >>>> >>>> On Sat, 2009-05-16 at 09:33 +0200, Giacomo Tesio wrote: >>>> >>>> AFAIK the closures are a sort of "pointer" to the value (actually to a >>>> wrapper of the context that is used). >>>> >>>> >>>> >>>> I suppose that in this case, someone else (like the CLR) would update >>>> the closures values so that the yet compiled query would find the correct >>>> values in the environment. >>>> >>>> >>>> >>>> >>>> >>>> I promise I'll take a look to the QueryCache again monday, but I don't >>>> believe we could actually remove the cache, since the DbLinq's performance >>>> would make it unusable. >>>> >>>> >>>> >>>> The problem here is understanding fully the CLR managing of closures and >>>> understanding why at different time you get different closures. We could >>>> also include the closures in the expression comparisation (which actually >>>> work only on the string rappresentation of the expression itself). >>>> >>>> >>>> >>>> >>>> >>>> Giacomo >>>> >>>> PS: as to the connected datacontext approach: I think that long lived >>>> datacontext should not be "connected", so ours will have >>>> ObjectTrackingEnable == false. >>>> >>>> >>>> >>>> >>>> >>>> On Fri, May 15, 2009 at 7:34 PM, Jonathan Pryor <[email protected]> >>>> wrote: >>>> >>>> I think the correct approach is to use CompiledQuery, for precisely >>>> the cache management reasons I originally outlined. >>>> >>>> Though long-lived DataContexts would be problematic, as you point out, >>>> as you'd effectively clone the entire DB into memory given enough time and >>>> queries... I don't see how a connected database approach (which >>>> DataContext follows) could behave otherwise. >>>> >>>> Does DbLinq support compiled queries? find says: >>>> >>>> $ find src -name CompiledQuery.cs >>>> src/DbLinq/System.Data.Linq/CompiledQuery.cs >>>> >>>> Survey says...No. I believe *we should*. >>>> >>>> I'm not sure what you mean by our queries being static. Once we cache a >>>> query, we don't change it anymore. (If we did, we'd be altering the >>>> original query, which would be...bad.) >>>> >>>> I'm also not sure what you're getting at with closures. >>>> >>>> The way I envision CompiledQuery working is very similar/identical to >>>> how your existing query caching works, except instead of storing the >>>> SelectQuery type within QueryCache, it would instead be stored within >>>> the delegate returned from CompiledQuery.Compile(). Thus, the user >>>> actually deals with pre-parsed query statements, and DbLinq doesn't need to >>>> re-parse the LINQ statement again (as the delegate returned by >>>> CompiledQuery.Compile() stores the pre-parsed SelectQuery instance). >>>> >>>> I'm not sure how much work this would take, but I'm hopeful that it >>>> wouldn't be too much work. >>>> >>>> *However*, this does bring up two related issues. >>>> >>>> 1. I still need to figure out wtf is going on with the NerdDinner >>>> caching bug I was seeing last week (and better, how to reproduce in the >>>> unit >>>> tests so that you can take a better look at it). >>>> >>>> 2. SelectQuery.GetCommand() gives me *really* bad feelings, because: >>>> >>>> 1. It takes no arguments. >>>> 2. It calls InputParameterExpression.GetValue() with no parameters. >>>> 3. (A) and (B) together imply that, even though the underlying >>>> SELECT takes named parameters (yay), there's no way to actually >>>> provide them/alter them for the current SelectQuery instance (wtf?). >>>> >>>> >>>> I *think* this is why (1) fails for me (but again, I still need to >>>> debug). >>>> >>>> In any event, it makes no sense to me at all. The point to having a >>>> SELECT with parameters is so that you can cache the expression itself >>>> but vary the parameters. But since we're not providing any parameters, the >>>> parameters can't vary. So... >>>> >>>> It makes my head hurt, if nothing else. >>>> >>>> I would instead expect AbstractQuery.GetCommand() to take an 'object[] >>>> parameters' argument (or similar) so that we can cache the actual >>>> select statement w/o associated parameters. This would also dovetail >>>> nicely >>>> with the semantics CompiledQuery.Compile(), as you can provide >>>> parameters to the expression you're compiling: >>>> >>>> var pepleWithLastName = CompiledQuery.Compile( >>>> (PeopleDb db, string lastName, int start, int count) => >>>> (from p in db.People >>>> where p.LastName == lastName >>>> select p) >>>> .Skip(start) >>>> .Take(count)); >>>> foreach (p in peopleWithLastName(myDB, "Foo", 0, 1)) ... >>>> >>>> Alas, CompiledQuery.Compile() will only create delegates accepting 4 >>>> parameters, but there are workarounds... >>>> - Jon >>>> >>>> >>>> >>>> >>>> On Fri, 2009-05-15 at 16:35 +0200, Giacomo Tesio wrote: >>>> >>>> I'll really hope is it right... :-| >>>> It would be a great problem otherwise. >>>> >>>> AFAIK, the datacontext rappresent a UnitOfWork. If the unit of work is >>>> long as the application lifetime, than keeping it alive is right. >>>> But in an internet application delivered via http, I've got my doubt. >>>> >>>> If it's not readonly, the tracked entities would become an unaligned >>>> copy of the full database... in memory. >>>> >>>> >>>> That said, does DbLinq support compiled queries? >>>> >>>> The problem with this approach, would be that our queries are not >>>> static: we progressively add clausoles to IQueryable<T>. >>>> And, how to handle with closures? >>>> >>>> >>>> Giacomo >>>> >>>> >>>> >>>> On Fri, May 15, 2009 at 1:37 PM, Jonathan Pryor <[email protected]> >>>> wrote: >>>> >>>> I think you'll find that you're doing it wrong. :-) >>>> >>>> First, I'm not sure that the assertion that all apps have short-lived >>>> DataContexts is correct. That's certainly not the case for NerdDinner, >>>> which has only one DataContext for the lifetime of the app. Many apps >>>> may have short-lived DataContexts, but many won't. >>>> >>>> Secondly, and primarily, Microsoft doesn't do implicit query caching. >>>> They do *explicit* query caching: >>>> >>>> >>>> http://blogs.msdn.com/ricom/archive/2008/01/11/performance-quiz-13-linq-to-sql-compiled-queries-cost.aspx >>>> >>>> http://blogs.msdn.com/ricom/archive/2008/01/14/performance-quiz-13-linq-to-sql-compiled-query-cost-solution.aspx >>>> >>>> For example: >>>> >>>> var fq = CompiledQuery.Compile >>>> ( >>>> (Northwinds nw) => >>>> (from o in nw.Orders >>>> select new >>>> { >>>> OrderID = o.OrderID, >>>> CustomerID = o.CustomerID, >>>> EmployeeID = o.EmployeeID, >>>> ShippedDate = o.ShippedDate >>>> }).Take(5) >>>> ); >>>> >>>> The result of CompiledQuery.Compile is a pre-compiled, pre-analyzed >>>> query, which the *user* is responsible for caching and dealing with. >>>> Query caches are *not* part of DataContext itself, *precisely because*it's >>>> a recipe for a giant memory leak. >>>> >>>> (Consider a fictional app which uses Linq-to-SQL once at startup, or >>>> otherwise very infrequently. The DbLinq approach would assure that the >>>> original cached queries would never be freed; it would be a permanent >>>> memory >>>> tax on the app.) >>>> >>>> I would *strongly *suggest that you follow Microsoft's approach, drop >>>> the DataContext query caching, and use CompiledQuery instead. >>>> >>>> - Jon >>>> >>>> >>>> >>>> On Fri, 2009-05-15 at 10:25 +0200, Giacomo Tesio wrote: >>>> >>>> OFFTOPIC about the interfaces >>>> IQueryCache is an example of an interface which must not be removed. >>>> This way if a skilled programmer would really need it, it could reimplement >>>> it to better fill it's requirement. >>>> >>>> Even if using ReaderWriterLockSlim, thread safety has a cost: some one >>>> could require speed rather than thread safety. >>>> Than it should be possible to reimplement it. >>>> >>>> Another use of such an interface could be to move the cache out of the >>>> appdomain (in dedicated cache servers) and share them among servers. This >>>> would also make the cache livecicle longher than the iis application. >>>> I'm not a fan of this solution (I'm not sure this would lead in better >>>> performances), but in an enteprise environment like ours, it had to be >>>> possible to take such a decision later. >>>> >>>> That's why I think that good interfaces are a good thing: we should not >>>> decide how DbLinq as to be used (tecnologically speaking of course). >>>> >>>> Having GOOD internal interfaces allow better flexibility in the long >>>> run, and produce a better open source product. >>>> >>>> >>>> BAD interfaces, on the other hand, reduce flexibility and improve >>>> developments effort, but I think that they always underling a wrong >>>> analisis >>>> or design (if not a completely missing one). >>>> >>>> >>>> I encounter often .NET programmers talking against interfaces. But till >>>> now, I've always noticed they are talking about wrong interfaces they have >>>> designed bottom up. It could took much time to explain a Microsoft .NET >>>> developer that the problem are not the interfaces, the problem is their >>>> design. >>>> >>>> More or less like explaing them that a Domain Model IS Object >>>> Orientation, not a way of doing Object Orientation. >>>> Or to explain them that a object oriented language does not lead by >>>> itself to object oriented software... >>>> >>>> >>>> Ok... I don't like Microsoft. :-D >>>> >>>> >>>> >>>> Giacomo >>>> >>>> >>>> On Fri, May 15, 2009 at 10:00 AM, Giacomo Tesio <[email protected]> >>>> wrote: >>>> >>>> That's a good question, i think. >>>> >>>> QueryCache is a cache of generated queries for each expression tree >>>> evaluated. >>>> It actually has to be static (at least thread static, but this would >>>> multiply the memory usage per number of threads, also reducing the hits and >>>> reducing livetime to the thread one) to improve the hits, since in the most >>>> common DataContext use case, it rappresents a unit of work and have short >>>> live. >>>> If the QueryCache livecicle would match the DataContext one, probably it >>>> would have no reason to exists. >>>> >>>> In "our" DbLinq use case, a readonly DataContext is used from all >>>> threads and has a long live (the AppDomain one), while single unit of works >>>> actually are created per request or on a per need basis. >>>> >>>> The global readonly one, has to be no "instance caches" at all (no >>>> object tracking for example, and I hope there are no other caches... but >>>> actually I should indagate this more), but still need a QueryCache becouse >>>> it could share its yet parsed expression tree among threads. >>>> >>>> Moreover all the DbLinq DataContexts would benefit from such a static >>>> queries, greatly increasing the performances (I hope! ! ! :-D). >>>> >>>> Other caches (like the EntityTrakings one) must not be static since they >>>> are conceptually linked to the DataContext that fill and use them. >>>> >>>> >>>> >>>> >>>> Giacomo >>>> >>>> >>>> >>>> >>>> On Thu, May 14, 2009 at 5:19 PM, Jonathan Pryor <[email protected]> >>>> wrote: >>>> >>>> This is bound to be a stupid/silly question, but why do the caches need >>>> to be static? Static data is effectively global, i.e. a GC root in and >>>> of itself, and thus will never be collected. Even with a good policy, it's >>>> possible that this could use more memory than people would expect. >>>> >>>> See also: >>>> >>>> http://blogs.msdn.com/oldnewthing/archive/2006/05/02/588350.aspx<http://blogs.msdn.com/oldnewthing/archive/2006/05/02/588350.aspx> >>>> http://blogs.msdn.com/ricom/archive/2004/01/19/60280.aspx >>>> >>>> Is there *really* a need for a cache that's static (i.e. shared amongst >>>> all DataContext instances)? Or can it just be non-static and attached >>>> to the DataContext (which would also remove all thread safety >>>> requirements). >>>> >>>> Put another way, with non-shared caches if the DataContext gets >>>> collected then the cache is also collected, thus providing a natural >>>> mechanism to clear the cache. With shared (static) caches, they're not >>>> connected to the DataContext, and thus it could be holding cached data >>>> for a DataContext that no longer exists. (This may not be the case >>>> anyway; I haven't fully read and understood the code. I'm just trying to >>>> make clear that preferring shared caches isn't an open and shut easy >>>> decision.) >>>> >>>> Thanks, >>>> - Jon >>>> >>>> >>>> >>>> On Thu, 2009-05-14 at 16:55 +0200, Giacomo Tesio wrote: >>>> >>>> I've two need: >>>> - Thread Safety of static caches: should be done for QueryCache, but Jon >>>> have encountered a strange (unreproduced on tests) bug while working on >>>> NerdDinner. If no other static cache exists they are ok. >>>> >>>> >>>> - XmlMappingSource working correctly: now, associations are not loaded >>>> from external mappings. This fix require IDataMapper and DataMapper >>>> modifications and DataContext fixes. >>>> >>>> >>>> The first is absolutelly needed (but should yet work right, just missing >>>> true multithread test on multi core machines). >>>> The second I think is really important, but require a bit of work. >>>> >>>> >>>> >>>> Giacomo >>>> >>>> >>>> >>>> On Thu, May 14, 2009 at 4:28 PM, Sharique <[email protected]> wrote: >>>> >>>> >>>> Hi, >>>> It has been almost 1 year since last release, I think it is time make >>>> a new release (guess 0.19) . If there is any blocking issue, pls put >>>> it here for discussion. So that we can resolve it quickly. >>>> >>>> -- >>>> Sharique >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >> >> >> > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "DbLinq" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/dblinq?hl=en -~----------~----~----~----~------~----~------~--~---
