In this post I would like to explain some ideas about the 'complex
projections' problem. Maybe the problem is controlled but I don't think so.
Nonetheless if it were controlled, a discussion with comments would be
interesting to contrast ideas and clarify concepts.

1. Introduction
2. DbLinq implementation state
3. Where and how to begin

== Introduction ==

I understand as "complex projections" those projections which contains
associations accesses (AKA EntityRefs & EntitySets) or contains not-scalar
nested queries.

example:
db.Orders.Select(o=>new {o.Employee, o.OrderDetails,
o.OrderDetails.Select(od=> od.ProductID});

In the above example you can see the three cases:
· An association many to one (EntityRef or parent association)
· An association one to many (EntitiSet or children association)
· A not-scalar nested query.

IMO this kind of projections are one of the most important features that
Linq2Sql (and others O/R mappers) offers. This kind of queries allow us to
retrieve hierarchied information (a tree) vs a sql queries, which return
plain information (a table)

What I mean with a tree structure?

For example db.Employees.Select(e=>e.Orders) expression returns a
IEnumerable<EntitySet<Order>> and this type has got a tree structure.

AFAIK DbLinq currently doesn't support it yet, or at least it crashes in
such cases. To implement it, probably we will need to make modifications
into the CutOutOperands* stage and QueryRunner class, specially for
EntitySet & not-scalar nested queries. Why? because the information of each
tuple (of type EntitySet<Order>) is in more than one row.

[* Have we got a better name for this stage? someone with a better English
that mine?]
Let's to see how MS Linq2Sql translates the above expression:

SELECT [t1].[OrderID], [t1].[CustomerID], [t1].[EmployeeID],
[t1].[OrderDate], [t1].[RequiredDate], [t1].[ShippedDate], [t1].[ShipVia],
[t1].[Freight], [t1].[ShipName], [t1].[ShipAddress], [t1].[ShipCity],
[t1].[ShipRegion], [t1].[ShipPostalCode], [t1].[ShipCountry], (
    SELECT COUNT(*)
    FROM [dbo].[Orders] AS [t2]
    WHERE [t2].[EmployeeID] = [t0].[EmployeeID]
    ) AS [value]
FROM [dbo].[Employees] AS [t0]
LEFT OUTER JOIN [dbo].[Orders] AS [t1] ON [t1].[EmployeeID] =
[t0].[EmployeeID]
ORDER BY [t0].[EmployeeID], [t1].[OrderID]

the results are:

1 AIRBU 1 2008-09-13 21:55:23.703 NULL NULL NULL 21 NULL NULL NULL NULL NULL
NULL 5
2 BT___ 1 2008-09-13 21:55:23.703 NULL NULL NULL 11 NULL NULL NULL NULL NULL
NULL 5
3 BT___ 1 2008-09-13 21:55:23.703 NULL NULL NULL 12 NULL NULL NULL NULL NULL
NULL 5
4 UKMOD 1 2008-09-13 21:55:23.703 NULL NULL NULL 33 NULL NULL NULL NULL NULL
NULL 5
5 BONAP 1 1996-10-16 00:00:00.000 1996-11-27 00:00:00.000 1996-10-21
00:00:00.000 NULL 10 Bon app' 12, rue des Bouchers Marseille NULL NULL
France 5
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 0
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 0

Interesting points:
- The last column (called 'value') informs to the number of 'children
orders' given an employee.
- Last two rows represent two employees without any child order
As we have seen linq2sql build a tree structure from the table results using
the auxiliary column 'value'.
- Employee info is not retreived since it is not used.

== DbLinq implementation state ==

We have several problems.

* Let's to analyze the above example: db.Employees.Select(e=>e.Orders)

DbLinq generates the following sql query:

SELECT [OrderID], [CustomerID], [EmployeeID], [OrderDate], [RequiredDate],
[ShippedDate], [ShipVia], [Freight], [ShipName], [ShipAddress], [ShipCity],
[ShipRegion], [ShipPostalCode], [ShipCountry]
FROM [dbo].[Orders]
WHERE e$.[EmployeeID] = [EmployeeID]

Besides, the ExpressionDispatcher.Analyze don't work properly for this
expression given that it returns a TableExpression of Order (at
QueryBuilder.BuildExpressionQuery line:225).

Therefore CutOutOperands creates a reader of type:
Func<IDataRecord,MappingContext,Order> when it should be
Func<IDataRecord,MappingContext,EntitySet<Order>> so a casting fails in
QueryRunner stage.

Three conclusions here:
- EntitySet member access should not be transformed to TableExpression.
Other way would be to add to the TableExpression class more information for
detect this case after, at the CutOutOperands stage.
- CutOutOperands method  isn't fully implemented.
- Sql query doesn't provides enough information for rebuilding a
IEnumerable<EntitySet<Order>> structure.

* Let's to analyze another example: db.Employees.Select(e=>new {e.Orders})

This example crashes in NewExpressionMutator.Mutate(IList<Expression>). The
problem is that the "e.Orders expression" is translated to a TableExpression
(like the above case). Therefore an exception is thrown because there is no
constructor with a TableExpression-parameter in the anonymous type (a
similar problem of the "left-outer-join problem" and the Equal expression):

Expression.New(NewExpression.Constructor, operands, NewExpression.Members);

Conclusions here:
* EntitySet member access expression currently work inside of a "new
expression", (Will work EntityRefs inside of a "new expression"? I don't
know, I would like check it)
* We currently support to project scalar expressions and ColumnExpressions
inside of a "new expression"
* Likely non-scalar subqueries inside of a "new expression" currently don't
work properly.

There are another samples with other different problems, but I think that
both examples are enough for now.

== Where and how to begin ==

I have two proposals about how to start fixing such problems:

A) An option is to use lazy loading to make the a generic and simple
implementation (with the cost of a worse performance).

A.1- At ExpressionDispatcher.Analyze.MemberAccess. if the access is of type
EntitySet<Z> it should be replaced as EntitySetColumnExpression. (like
simpler member access expressions are replaced as ColumnExpressions.)

example: db.Employees.Select (e=>e.Orders.Select(o=>new{o.OrderDetails})
---> db.Employees.Select (e=>new {EntitySetColumnExpression})

A.2-
· EntitySetColumnExpression should contains a reference to the original
subexpression generated by the c# compiler, I'll call to such subexpression
"G".
· EntitySetColumnExpression should also contains a ColumnExpression of the
PK column of the parent entity.

example:
esColumn.SourceExpression = Call(Select, (o=>new {o.OrderDetails}))) // this
expression is called G
esColumn.PseudoColumn = new ColumnExpression(TableX, name,...) (where
name="EmployeeID")

A.3- The sql query is generated. Such EntitySetColumnExpressions will be
generated to sql using the PseudoColumn (ColumnExpressions)

example:
it should be something like: SELECT $t1.EmployeID FROM Employee $t1

A.4- The rowCreator should consider that all EntitySeColumnExpressions
should be converted to EntitySet<Z> type.

var currentPK = row.GetIntegerAt(index);
var value = new EntitySet<Z>();
value.SetSource( G(db.ZTable.Where(o=>o.FK=currentPk)));

Theoretically this should work even in the most complex nested expressions.
However this implementation would be worse in performance, example:

var orders=db.Employees.Select(e=>e.Orders); // Database contains 100
employees
foreach(var o in orders)
{
    Console.WriteLine(o.Count());
}

The above piece of code make 101 database accesses.
Nonetheless I think that such implementation would be a very good first
approximation.

B) Not to use lazy loading. I mean, retrieve all the information in a only
sql query. Like Linq2Sql does.

B.1 - We could include into the TableExpression some mark wich told us the
TableExpression was originally an EntitySet member access expression.

B.2 - We also need to change the generated sql. We need an auxiliar column
that allows us build the object-tree.
Two choices here:
            * Using a count expression column with the number of children of
the parent entity (like MS linq2sql)
            * Project PK column of the parent entity (in our example
Employees.EmployeeID).

C) Your proposal

Corrections, suggestions, ideas, all are welcome. Remember that theses
discussions are also documentation and reference for the project.

Regards.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"DbLinq" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/dblinq?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to