[dblinq] Re: new release ?

Pascal Craponne Sat, 16 May 2009 13:03:31 -0700

OK, so I suggest to switch back to tree visitor and ignore closures values
(not types) in the comparison process.


On Sat, May 16, 2009 at 21:08, Giacomo Tesio <[email protected]> wrote:

> A note: on cache hit the expression tree visitor should be far faster than
> the one creating the cached select query (that without the param values),
> becouse it should only handle parameters itself.
>
>
> I'll probably implement a tree visitor focused to handle them.
>
>
> What do you think about this?
>
>
> Giacomo
>
> On Sat, May 16, 2009 at 8:54 PM, Giacomo Tesio <[email protected]> wrote:
>
>> No it probably wasn't...
>>
>> But with the old approach the query cache hit only with queries with
>> identical parameter values. This lead in a performance problem, since memory
>> grow too much (a query is cached for each values' combination) and (since
>> the hash code provided in ExpressionEqualityComparer didn't include the
>> parameters' values) after a while the queries compared for equality during
>> to the dictionary lookup would become too much.
>>
>>  When I added the string comparison, I actually didn't go so deep in the
>> code to understand that the parameters' values where cached too (even if I
>> did some test which didn't show this problem... wrong test!)
>>
>> Let me some hours to search a better approach.
>> Probably, the best one, would be to split the SelectQuery in two parts,
>> one immutable and cachable (without the parameter values) and one with
>> parameters' values included.
>>
>> The cache hit would provide a select query which actually require a parse
>> of the expression tree (if I can not find a way to cache the closures) to
>> fill the query with those values.
>>
>>
>> That said, there are surely case (even for other use we do of
>> DbLinq) where CompileQuery would be a better approach, and time allowing,
>> I'll try to implement it.
>>
>>
>>
>> Giacomo
>>
>>   On Sat, May 16, 2009 at 3:58 PM, Pascal Craponne <[email protected]>wrote:
>>
>>> I took a look at the history, and it is now just a string comparison.
>>> Was it failing before the end of april (27th)?
>>>
>>> On Sat, May 16, 2009 at 15:50, Pascal Craponne <[email protected]> wrote:
>>>
>>>> Hi,
>>>> right, two expressions at different places have a different signature
>>>> (hash and reference). Even the same expression called twice has a 
>>>> difference
>>>> signature. This is why when I wrote the Expression comparer I chosed the
>>>> option to compare the comparable contents.
>>>> I'm pretty sure it was working at that time.
>>>> Someone probably added a new comparison, which now causes comparisons to
>>>> fail.
>>>>
>>>> Do we have simple samples of comparison returning an equality where it
>>>> should not?
>>>> Do we also have opposite samples?
>>>>
>>>> I don't like the ToString() approach, since it is not proven to be 100%
>>>> reliable (I'm not sure all sub expressions generate something and something
>>>> unique where converted to a string).
>>>>
>>>> Pascal.
>>>>
>>>> On Sat, May 16, 2009 at 13:55, Jonathan Pryor <[email protected]> wrote:
>>>>
>>>>> Behold the actual underlying problem to why QueryCache is broken:
>>>>>
>>>>> var id = "foo";
>>>>> var a = db.Customers.Where(c => c.CustomerID == id);
>>>>> id = "bar";
>>>>> var b = db.Customers.Where(c => c.CustomerID == id);
>>>>> Console.WriteLine("HashCodes: a={0}; b={1}", a.GetHashCode(), 
>>>>> b.GetHashCode());
>>>>> Console.WriteLine("a == b? {0}", a.Equals(b));
>>>>>
>>>>> Output while running under .NET:
>>>>>
>>>>> HashCodes: a=38068051; b=17286255
>>>>> a == b? False
>>>>>
>>>>> In short, it is *not* appropriate to use an expression tree as a key
>>>>> into a Dictionary, as they do not demonstrate value equality properties.
>>>>>
>>>>> QueryCache is broken (and effectively unfixably broken) because the
>>>>> only useful key into the cache that we have...can't actually be used as a
>>>>> key.
>>>>>
>>>>> (Though I suppose we could instead the result of ToString() as the key
>>>>> into the cache, but more on that below.)
>>>>>
>>>>> However, just using ToString() won't fix the unit test breakage I
>>>>> wrote yesterday.  It will instead just make the *existing*(un-patched)
>>>>> ReadTest.A5_SelectSingleOrDefault() test fail (for precisely the same
>>>>> "input parameter values are *also* cached" issue I referred to
>>>>> yesterday).
>>>>>
>>>>> Finally, I'd like to put try one last effort at pointing out why
>>>>> CompiledQuery.Compile() is a better approach.  Executing a Linq-to-SQL
>>>>> query takes 3 steps:
>>>>>
>>>>>    1. Create the expression tree (done by the compiler and at runtime
>>>>>    via Queryable extension methods).
>>>>>    2. Convert the expression tree into SQL code (done by DbLinq).
>>>>>    3. Execute the SQL against the database.
>>>>>
>>>>> The current QueryCache approach only caches the result of (2), and if
>>>>> you've ever profiled expression trees you'll find that (1) can also be 
>>>>> quite
>>>>> slow (often unexpectedly so).  (Which is why, while using ToString()would 
>>>>> "fix" the key lookup over using the expression tree itself, still
>>>>> isn't an ideal fix: ToString() requires traversing the (possibly quite
>>>>> large) expression tree, and would thus add additional overhead just to
>>>>> perform the cache lookup.)
>>>>>
>>>>> Meanwhile, CompiledQuery.Compile() allows caching *both* steps (1) *
>>>>> and* (2).  There is no need for the delegate returned by
>>>>> CompiledQuery.Compile() to re-create the expression tree, nor to
>>>>> re-parse it, which saves time (which is why CompiledQuery exists in
>>>>> the first place).
>>>>>
>>>>> *Please* consider implementing CompiledQuery.
>>>>>
>>>>> - Jon
>>>>>
>>>>>
>>>>> On Sat, 2009-05-16 at 09:33 +0200, Giacomo Tesio wrote:
>>>>>
>>>>> AFAIK the closures are a sort of "pointer" to the value (actually to a
>>>>> wrapper of the context that is used).
>>>>>
>>>>>
>>>>>
>>>>> I suppose that in this case, someone else (like the CLR) would update
>>>>> the closures values so that the yet compiled query would find the correct
>>>>> values in the environment.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I promise I'll take a look to the QueryCache again monday, but I don't
>>>>> believe we could actually remove the cache, since the DbLinq's performance
>>>>> would make it unusable.
>>>>>
>>>>>
>>>>>
>>>>> The problem here is understanding fully the CLR managing of closures
>>>>> and understanding why at different time you get different closures. We 
>>>>> could
>>>>> also include the closures in the expression comparisation (which actually
>>>>> work only on the string rappresentation of the expression itself).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Giacomo
>>>>>
>>>>> PS: as to the connected datacontext approach: I think that long lived
>>>>> datacontext should not be "connected", so ours will have
>>>>> ObjectTrackingEnable == false.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 15, 2009 at 7:34 PM, Jonathan Pryor <[email protected]>
>>>>> wrote:
>>>>>
>>>>>  I think the correct approach is to use CompiledQuery, for precisely
>>>>> the cache management reasons I originally outlined.
>>>>>
>>>>> Though long-lived DataContexts would be problematic, as you point out,
>>>>> as you'd effectively clone the entire DB into memory given enough time and
>>>>> queries...  I don't see how a connected database approach (which
>>>>> DataContext follows) could behave otherwise.
>>>>>
>>>>> Does DbLinq support compiled queries?  find says:
>>>>>
>>>>> $ find src -name CompiledQuery.cs
>>>>> src/DbLinq/System.Data.Linq/CompiledQuery.cs
>>>>>
>>>>> Survey says...No.  I believe *we should*.
>>>>>
>>>>> I'm not sure what you mean by our queries being static.  Once we cache
>>>>> a query, we don't change it anymore.  (If we did, we'd be altering the
>>>>> original query, which would be...bad.)
>>>>>
>>>>> I'm also not sure what you're getting at with closures.
>>>>>
>>>>> The way I envision CompiledQuery working is very similar/identical to
>>>>> how your existing query caching works, except instead of storing the
>>>>> SelectQuery type within QueryCache, it would instead be stored within
>>>>> the delegate returned from CompiledQuery.Compile().  Thus, the user
>>>>> actually deals with pre-parsed query statements, and DbLinq doesn't need 
>>>>> to
>>>>> re-parse the LINQ statement again (as the delegate returned by
>>>>> CompiledQuery.Compile() stores the pre-parsed SelectQuery instance).
>>>>>
>>>>> I'm not sure how much work this would take, but I'm hopeful that it
>>>>> wouldn't be too much work.
>>>>>
>>>>> *However*, this does bring up two related issues.
>>>>>
>>>>> 1. I still need to figure out wtf is going on with the NerdDinner
>>>>> caching bug I was seeing last week (and better, how to reproduce in the 
>>>>> unit
>>>>> tests so that you can take a better look at it).
>>>>>
>>>>> 2. SelectQuery.GetCommand() gives me *really* bad feelings, because:
>>>>>
>>>>>    1. It takes no arguments.
>>>>>    2. It calls InputParameterExpression.GetValue() with no parameters.
>>>>>
>>>>>    3. (A) and (B) together imply that, even though the underlying
>>>>>    SELECT takes named parameters (yay), there's no way to actually
>>>>>    provide them/alter them for the current SelectQuery instance
>>>>>    (wtf?).
>>>>>
>>>>> I *think* this is why (1) fails for me (but again, I still need to
>>>>> debug).
>>>>>
>>>>> In any event, it makes no sense to me at all.  The point to having a
>>>>> SELECT with parameters is so that you can cache the expression itself
>>>>> but vary the parameters.  But since we're not providing any parameters, 
>>>>> the
>>>>> parameters can't vary.  So...
>>>>>
>>>>> It makes my head hurt, if nothing else.
>>>>>
>>>>> I would instead expect AbstractQuery.GetCommand() to take an 'object[]
>>>>> parameters' argument (or similar) so that we can cache the actual
>>>>> select statement w/o associated parameters.  This would also dovetail 
>>>>> nicely
>>>>> with the semantics CompiledQuery.Compile(), as you can provide
>>>>> parameters to the expression you're compiling:
>>>>>
>>>>> var pepleWithLastName = CompiledQuery.Compile(
>>>>>   (PeopleDb db, string lastName, int start, int count) =>
>>>>>           (from p in db.People
>>>>>            where p.LastName == lastName
>>>>>            select p)
>>>>>           .Skip(start)
>>>>>           .Take(count));
>>>>> foreach (p in peopleWithLastName(myDB, "Foo", 0, 1)) ...
>>>>>
>>>>> Alas, CompiledQuery.Compile() will only create delegates accepting 4
>>>>> parameters, but there are workarounds...
>>>>> - Jon
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 2009-05-15 at 16:35 +0200, Giacomo Tesio wrote:
>>>>>
>>>>> I'll really hope is it right... :-|
>>>>> It would be a great problem otherwise.
>>>>>
>>>>> AFAIK, the datacontext rappresent a UnitOfWork. If the unit of work is
>>>>> long as the application lifetime, than keeping it alive is right.
>>>>> But in an internet application delivered via http, I've got my doubt.
>>>>>
>>>>> If it's not readonly, the tracked entities would become an unaligned
>>>>> copy of the full database... in memory.
>>>>>
>>>>>
>>>>> That said, does DbLinq support compiled queries?
>>>>>
>>>>> The problem with this approach, would be that our queries are not
>>>>> static: we progressively add clausoles to IQueryable<T>.
>>>>> And, how to handle with closures?
>>>>>
>>>>>
>>>>> Giacomo
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 15, 2009 at 1:37 PM, Jonathan Pryor <[email protected]>
>>>>> wrote:
>>>>>
>>>>> I think you'll find that you're doing it wrong. :-)
>>>>>
>>>>> First, I'm not sure that the assertion that all apps have short-lived
>>>>> DataContexts is correct.  That's certainly not the case for
>>>>> NerdDinner, which has only one DataContext for the lifetime of the
>>>>> app.  Many apps may have short-lived DataContexts, but many won't.
>>>>>
>>>>> Secondly, and primarily, Microsoft doesn't do implicit query caching.
>>>>> They do *explicit* query caching:
>>>>>
>>>>>
>>>>> http://blogs.msdn.com/ricom/archive/2008/01/11/performance-quiz-13-linq-to-sql-compiled-queries-cost.aspx
>>>>>
>>>>> http://blogs.msdn.com/ricom/archive/2008/01/14/performance-quiz-13-linq-to-sql-compiled-query-cost-solution.aspx
>>>>>
>>>>> For example:
>>>>>
>>>>> var fq = CompiledQuery.Compile
>>>>> (
>>>>>     (Northwinds nw) =>
>>>>>             (from o in nw.Orders
>>>>>             select new
>>>>>                    {
>>>>>                        OrderID = o.OrderID,
>>>>>                        CustomerID = o.CustomerID,
>>>>>                        EmployeeID = o.EmployeeID,
>>>>>                        ShippedDate = o.ShippedDate
>>>>>                    }).Take(5)
>>>>> );
>>>>>
>>>>> The result of CompiledQuery.Compile is a pre-compiled, pre-analyzed
>>>>> query, which the *user* is responsible for caching and dealing with.
>>>>> Query caches are *not* part of DataContext itself, *precisely 
>>>>> because*it's a recipe for a giant memory leak.
>>>>>
>>>>> (Consider a fictional app which uses Linq-to-SQL once at startup, or
>>>>> otherwise very infrequently.  The DbLinq approach would assure that the
>>>>> original cached queries would never be freed; it would be a permanent 
>>>>> memory
>>>>> tax on the app.)
>>>>>
>>>>> I would *strongly *suggest that you follow Microsoft's approach, drop
>>>>> the DataContext query caching, and use CompiledQuery instead.
>>>>>
>>>>> - Jon
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 2009-05-15 at 10:25 +0200, Giacomo Tesio wrote:
>>>>>
>>>>> OFFTOPIC about the interfaces
>>>>> IQueryCache is an example of an interface which must not be removed.
>>>>> This way if a skilled programmer would really need it, it could 
>>>>> reimplement
>>>>> it to better fill it's requirement.
>>>>>
>>>>> Even if using ReaderWriterLockSlim, thread safety has a cost: some one
>>>>> could require speed rather than thread safety.
>>>>> Than it should be possible to reimplement it.
>>>>>
>>>>> Another use of such an interface could be to move the cache out of the
>>>>> appdomain (in dedicated cache servers) and share them among servers. This
>>>>> would also make the cache livecicle longher than the iis application.
>>>>> I'm not a fan of this solution (I'm not sure this would lead in better
>>>>> performances), but in an enteprise environment like ours, it had to be
>>>>> possible to take such a decision later.
>>>>>
>>>>> That's why I think that good interfaces are a good thing: we should not
>>>>> decide how DbLinq as to be used (tecnologically speaking of course).
>>>>>
>>>>> Having GOOD internal interfaces allow better flexibility in the long
>>>>> run, and produce a better open source product.
>>>>>
>>>>>
>>>>> BAD interfaces, on the other hand, reduce flexibility and improve
>>>>> developments effort, but I think that they always underling a wrong 
>>>>> analisis
>>>>> or design (if not a completely missing one).
>>>>>
>>>>>
>>>>> I encounter often .NET programmers talking against interfaces. But till
>>>>> now, I've always noticed they are talking about wrong interfaces they have
>>>>> designed bottom up. It could took much time to explain a Microsoft .NET
>>>>> developer that the problem are not the interfaces, the problem is their
>>>>> design.
>>>>>
>>>>> More or less like explaing them that a Domain Model IS Object
>>>>> Orientation, not a way of doing Object Orientation.
>>>>> Or to explain them that a object oriented language does not lead by
>>>>> itself to object oriented software...
>>>>>
>>>>>
>>>>> Ok... I don't like Microsoft. :-D
>>>>>
>>>>>
>>>>>
>>>>> Giacomo
>>>>>
>>>>>
>>>>> On Fri, May 15, 2009 at 10:00 AM, Giacomo Tesio <[email protected]>
>>>>> wrote:
>>>>>
>>>>> That's a good question, i think.
>>>>>
>>>>> QueryCache is a cache of generated queries for each expression tree
>>>>> evaluated.
>>>>> It actually has to be static (at least thread static, but this would
>>>>> multiply the memory usage per number of threads, also reducing the hits 
>>>>> and
>>>>> reducing livetime to the thread one) to improve the hits, since in the 
>>>>> most
>>>>> common DataContext use case, it rappresents a unit of work and have short
>>>>> live.
>>>>> If the QueryCache livecicle would match the DataContext one, probably
>>>>> it would have no reason to exists.
>>>>>
>>>>> In "our" DbLinq use case, a readonly DataContext is used from all
>>>>> threads and has a long live (the AppDomain one), while single unit of 
>>>>> works
>>>>> actually are created per request or on a per need basis.
>>>>>
>>>>> The global readonly one, has to be no "instance caches" at all (no
>>>>> object tracking for example, and I hope there are no other caches... but
>>>>> actually I should indagate this more), but still need a QueryCache becouse
>>>>> it could share its yet parsed expression tree among threads.
>>>>>
>>>>> Moreover all the DbLinq DataContexts would benefit from such a static
>>>>> queries, greatly increasing the performances (I hope! ! ! :-D).
>>>>>
>>>>> Other caches (like the EntityTrakings one) must not be static since
>>>>> they are conceptually linked to the DataContext that fill and use them.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Giacomo
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 14, 2009 at 5:19 PM, Jonathan Pryor <[email protected]>
>>>>> wrote:
>>>>>
>>>>> This is bound to be a stupid/silly question, but why do the caches need
>>>>> to be static?  Static data is effectively global, i.e. a GC root in
>>>>> and of itself, and thus will never be collected.  Even with a good policy,
>>>>> it's possible that this could use more memory than people would expect.
>>>>>
>>>>> See also:
>>>>>     
>>>>> http://blogs.msdn.com/oldnewthing/archive/2006/05/02/588350.aspx<http://blogs.msdn.com/oldnewthing/archive/2006/05/02/588350.aspx>
>>>>>     http://blogs.msdn.com/ricom/archive/2004/01/19/60280.aspx
>>>>>
>>>>> Is there *really* a need for a cache that's static (i.e. shared
>>>>> amongst all DataContext instances)?  Or can it just be non-static and
>>>>> attached to the DataContext (which would also remove all thread safety
>>>>> requirements).
>>>>>
>>>>> Put another way, with non-shared caches if the DataContext gets
>>>>> collected then the cache is also collected, thus providing a natural
>>>>> mechanism to clear the cache.  With shared (static) caches, they're
>>>>> not connected to the DataContext, and thus it could be holding cached
>>>>> data for a DataContext that no longer exists.  (This may not be the
>>>>> case anyway; I haven't fully read and understood the code.  I'm just 
>>>>> trying
>>>>> to make clear that preferring shared caches isn't an open and shut easy
>>>>> decision.)
>>>>>
>>>>> Thanks,
>>>>> - Jon
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 2009-05-14 at 16:55 +0200, Giacomo Tesio wrote:
>>>>>
>>>>> I've two need:
>>>>> - Thread Safety of static caches: should be done for QueryCache, but
>>>>> Jon have encountered a strange (unreproduced on tests) bug while working 
>>>>> on
>>>>> NerdDinner. If no other static cache exists they are ok.
>>>>>
>>>>>
>>>>> - XmlMappingSource working correctly: now, associations are not loaded
>>>>> from external mappings. This fix require IDataMapper and DataMapper
>>>>> modifications and DataContext fixes.
>>>>>
>>>>>
>>>>> The first is absolutelly needed (but should yet work right, just
>>>>> missing true multithread test on multi core machines).
>>>>> The second I think is really important, but require a bit of work.
>>>>>
>>>>>
>>>>>
>>>>> Giacomo
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 14, 2009 at 4:28 PM, Sharique <[email protected]> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>> It has been almost 1 year since last release, I think it is time make
>>>>> a new release (guess 0.19) . If there is any blocking issue, pls put
>>>>> it here for discussion. So that we can resolve it quickly.
>>>>>
>>>>> --
>>>>> Sharique
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"DbLinq" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/dblinq?hl=en
-~----------~----~----~----~------~----~------~--~---

[dblinq] Re: new release ?

Reply via email to