Hello everyone,
i'm working with a huge set of data that i have to grind in Lucene in
order to create the correct indexes. In order to correctly handle the
load, i implemented the following way of doing it: massively import
the data in the db, then pull it out page by page and index. I have
the following two objects sharing a N-N relationship: Declaration and
Support, each respectively containing the corresponding collection
property: Supports and Declarations.
So the first thing i did is create the paging criteria to get the
data:
Dim C1 As DetachedCriteria = DetachedCriteria.For(GetType
(Declaration)).SetFirstResult(pageNumber * itemsPerPage).SetMaxResults
(itemsPerPage)
and then iterate on the Declarations brought back to index them.
The problem is that when i'm indexing a Declaration, i index the
content of its Supports property (for ease of searching). So what
happened is that if i was paging 100 Declarations, i would hit the
database 101 times, one time for the initial query and one time for
each Supports collection.
"Aha! Typical N+1 problem" i said to myself after having been
repeatedly warned by NHProf, so i set out to query both datasets in
one go: the 100 paged Declarations and their corresponding Supports.
A quick search pulled out the following solution:
Dim C1 As DetachedCriteria = DetachedCriteria.For(GetType
(Declaration)).SetFirstResult(pageNumber * itemsPerPage).SetMaxResults
(itemsPerPage).SetFetchMode("Supports", NHibernate.FetchMode.Eager)
But the query went from snappy to terribly sluggish. What's more, it
appeared i had some declarations duplicated in the results. "Aha!
Typical cartesian product problem, i'll just add a distinct
parameter". Although the following criteria solved the problem, the
query was still really slow
Dim C1 As DetachedCriteria = DetachedCriteria.For(GetType
(Declaration)).SetFirstResult(pageNumber * itemsPerPage).SetMaxResults
(itemsPerPage).SetFetchMode("Supports",
NHibernate.FetchMode.Eager).SetResultTransformer(New
NHibernate.Transform.DistinctRootEntityResultTransformer())
And to boot, my paging was not good anymore, since i only had 15
Declarations instead of the 100 asked.
>From what i've read, when one wants to load a complex hierarchy and
avoid costly cartesian products, it is recommended to use a MultiQuery
or a MultiCriteriaQuery, so i set out doing it with a Multicriteria
query:
Dim C1 As DetachedCriteria = DetachedCriteria.For(GetType
(Declaration)).SetFirstResult(pageNumber * itemsPerPage).SetMaxResults
(itemsPerPage)
Dim C2 As DetachedCriteria = _
DetachedCriteria.For(GetType(Support)) _ ' i want the supports
.CreateCriteria("Declarations") _ ' whose Declarations
.Add(Subqueries.PropertyIn("Id", NHibernate.CriteriaTransformer.Clone
(C1).SetProjection(Projections.Id))) ' contain the declaration from my
first query
Dim queryResults As IList = Session.CreateMultiCriteria() _
.Add(C1) _
.Add(C2) _
.List()
The query is quick this time, and i have the correct number of
Declarations at the end. However, the list of Supports contains
duplicates (i have 13xx distinct instances instead of 180), and what's
more, when i access the Supports property of a Declaration, there's
still a call to the database, as if i hadn't loaded the Supports for
each Declaration. There is no wiring between the Declarations and the
Supports.
I would like to manage the correct loading technique, but i must say i
didn't find any examples regarding this particular problem. Anyone has
any idea?
Thank you very much!
Samy / vrittis
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"nhusers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nhusers?hl=en
-~----------~----~----~----~------~----~------~--~---