Re: [ADVANCED-DOTNET] Transparent Persistence

Frans Bouma Sat, 14 Feb 2004 05:28:04 -0800

> >         Whoa, who's the one who doesn't understand what system
design 
> > is all about? :-S
> 
> Ouch. That hurts. It really does. But---fortunately---you 
> misunderstood me. ;)


        Hmmm, what I read in your statement was not that vague and I
have the feeling I understood you quite well.

> Modeling is hard. But let's assume that we've managed to do a 
> good job here and that we came up with a very good model for 
> a specific domain.
> Furthermore, let's assume that we've used Halpin's ORM, with 
> which I'm quite familiar actually ;).

        good! :) 

> From this model, we come up with a) an object-model design 
> for our application layer, and b) a relational-database 
> schema for our data layer.
> These are really two distict tasks! Consider for example how 
> many-to-many relationships are modelled in database schemas: 
> we need auxilliary tables there. Well, there will be no such 
> things as auxilliary classes in our application-layer object 
> model. So, our basic model is still good, but different 
> layers need different incarnations of it.

        As I also replied to Jeff, this is not a correct conclusion.
Order - Product is an m:n relationship however the intermediate table,
OrderRow is in fact a relation, an objectified relationship. (relation
vs. relationship, in Dutch / German we use the same word for these,
which is a source for a lot of misunderstandings, I make this mistake a
lot too, I think even in this email :D) 

        There are of course m:n relationships which are not objectified.
These result in entities which are not considered an entity (but by
definition they are). Let's say you write code with this setup. You then
have some m:n relationships which have an intermediate table which is
seen as an entity, and other m:n relationships which are not seen as an
entity. 

        Let's go through a simple example. Employees and Departments.
Say these have an m:n relationship so employees can work in multiple
departments (they really work hard :)). We then need an intermediate
table, which is not really an entity, and this is DepartmentEmployees. 

        In our code, when we want to add an employee to a department, we
do this (pseudocode)

// create new empty employee entity object
EmployeeEntity newEmployee = new EmployeeEntity();
// fill in properties of this employee
myDepartment.Employees.Add(newEmployee);
// save the department and with that the employee, in 1 transaction
myDepartment.Save();

        Nice code, tiny, clean. 

        Now, the manager thinks he needs to know when each employee
started at which department. To model this correctly, you have to
objectify the department-employee relationship into a relation, and add
a non-pk, mandatory attribute 'StartDate'. 

        Now, go back to the codesnippet I just posted. It definitely
will not work, you need to add StartDate as well, as it is mandatory.
You also can't add the employee directly to Employees in the department
object, as it is related via a 3rd entity, so to reach the department
from an enmployee, the developer has to do something like:
myEmployee.DepartmentEmployees[index].Department

        This example is still obvious, as StartDate is mandatory. But
lets say it is first NON mandatory at first. You can keep the code as it
is posted first, because StartDate then will be simply NULL. However
when StartDate becomes mandatory, you have to alter the code as it will
throw a 'cant be null'-exception or similar for the StartDate field.

        What does this mean? Well, you have to know EXACTLY which non-pk
fields in an objectified relation are mandatory and which are not.
Instead of moving away from the database you move closer to the
database. It also introduces inconsistency: some m:n relationships have
to be treated this way, others in a total different way. 

        You can avoid this, by declaring m:n relations read-only and
always recognize intermediate entities. This greatly degrades complexity
of the code, as it doesn't require you to know when you can do
Employees.Add(newEmployee) and when you can't, it is ALWAYS the same. 

        You are of course fully entitled to write complex software,
however I've learned during the years that when you spend a little more
time on how to make things simpler instead of to make things more
complex, the application will be more maintainable and easier to build. 

> No problem so far: nothing but 'good design'. Over to 'bad design' ...

        no, no good design. A lot of m:n relationships run through
objectified relationships. Every table with at least one pair of 2 FK
fields is an intermediate table in an objectified m:n relationship! This
then results in a big inconsistency in your software: SOME intermediate
tables in m:n relations are saved automatically and OTHERS are not. Do
you know, by looking at Department.Employees, if that container contains
m:n related objects which are related using an objectified relationship
or not? No you can't tell, you need database information, very detailed
information. Furthermore, it makes code inconsistent: at one time it has
to be written like this, at another time, because deep below the non-pk
attribute suddenly now is mandatory, your code has to be written totally
different! I'm sorry, but I find that kind of inconsistency, which
causes code breaks far away from the database (you just flip a 'NULL'
into 'NOT NULL'), 'bad' design. 

        Consistency, that's what makes software work. Look for schemes
that work, always and apply them when appropriate. Always implement
functionality ABC as A -> B -> C, and never in another order, so you
always know you're doing it correctly if you follow A -> B -> C. Very
easy to learn for each developer: learn the steps, take them, and you're
done. Less errors, less bugs, less trouble, better software. 

        Remember: no-one is going to laugh at you because you wrote
consistent software, only the few people who think HOW is more important
than WHY. You know, the people who think it is key to write 'pure' OO
and think that by doing that the application is written better, while
they're forgetting that by using pure OO you definitely need translation
layers along the way, and every translation layer decreases the strength
of the connection between the functional design and the actual result in
code ("the functionality description in program text"), which then
results in weaker maintainability possibilities because it is hard to
tell where functionality ABC is implemented as there is no direct 1:1
projection of ABC possible on the code or vice versa.

> And this is something I've encounterd in real life many times 
> now. First of all, it seems tempting to forget about coming 
> up with a good basic model and just write out a database 
> schema. I guess I don't need to tell you what's wrong about 
> that. :) But, even if we could convice our design team to use 
> the ORM model we've carefully composed, danger is just around 
> the corner.
> I've worked with teams that spent a lot of time in designing 
> the database and then rushed forward and wrote 
> application-layer class definitions that were just one-on-one 
> translations of the database tables. So, there were, for 
> instance, auxilliary classes showing up to model many-to-many 
> relationships! This is what I mean by 'being able to seeing 
> the database through all layers'. This is what I consider 
> 'bad design'.
> I hope you agree with me about this.

        No I definitely don't agree with this and I hope you now
understand why. 

        But don't let that stop you creating inconsistend software which
falls apart the second you flip a not null constraint. Look, I spend
serious time solving this issue, as I had to solve it in my schema
retriever in LLBLGen Pro. I did some tests where I had the intermediate
tables hidden and where I didn't had them hidden and checked what would
happen if the datamodel changes (and against every law in the universe,
a lot of people like to mess with their datamodel on a daily basis as it
seems). I could only conclude after those tests and by looking at the
code necessary to solve the issue of saving m:n related objects that to
keep code consistent and to keep developers focussed and not make them
confused because suddenly a construct was totally different due to a
non-null non-pk attribute, the intermediate table should be known,
always. 

        And why is this all a bad thing? Because it shows the database
model? Why is that bad? Do you consider this code bad? ->

myOrder.CustomerID = _customerID;

and this good:

myOrder.Customer = myCustomer;

?

If so, you require the fetch of the Customer object to work with the
order. However, HOW do you know you have to set the FK field
'CustomerID' in Order to related Order to Customer and to be ABLE to
save the order? You can't know that UNLESS you know the facts about the
relational model. The underlying code can't invent some customer data if
you forget to set it in Order. In other words: a developer needs to know
the relational model to work with objects from the database.

        Software that is written in real projects has to work, always,
has to be maintainable easily and be extensible easily now and in the
(near) future, perhaps by other people. Solid software is written by
keeping things simple (but not simpler! :)) and consistent, so you know
what to expect and why and it is never 'suddenly different'. Those are
aspects of real-life software development which have top-most priority.
A lot of (mostly younger) developers forget about this and think solely
in code constructs and eye only a vague goal like 'the best object
model' or 'pure OO' or 'the solution without a single line of db related
code'. That's fine, as long as their manager knows that and tells them
to wake up and use other priorities at work. 

        IMHO, you are too focussed on the low(er) level aspects of
software development and want to solve with them high(er) level aspects
of software development. What about the great designs in Microsoft
Business Framework? Example: SalesOrder. This is a class which
aggregates Customer, Order and inside Order, OrderRows. Now you have a
new object to work with and it hides relational model aspects for a
great deal but not all of them (it still requires knowledge about
customer being related to order for example and customer being related
to salesorder.)

        This is however on a level higher than the O/R mapper. It
requires a new layer on top of the layer generated / provided by the O/R
mapper which solely provides BL objects which are combinations of lower
level objects provided by the O/R mapper. Do you really think it matters
to these higher order objects if the intermediate table is there? They
want it there! so the code in these objects is simple and doesn't have
to take into account situations where the intermediate table is
'suddenly hidden'. 

        Consistency and simplicity. Two good friends you want to keep
for life, trust me.

                FB

===================================
This list is hosted by DevelopMentor®  http://www.develop.com
Some .NET courses you may be interested in:

NEW! Guerrilla ASP.NET, 26 Jan 2004, in Los Angeles
http://www.develop.com/courses/gaspdotnetls

View archives and manage your subscription(s) at http://discuss.develop.com

Re: [ADVANCED-DOTNET] Transparent Persistence

Reply via email to