How should we handle the JPQL DISTINCT keyword

Michael Dick Wed, 28 Oct 2009 15:03:55 -0700

Hi all,

Currently we handle the JPQL DISTINCT keyword as a 1:1 mapping to the SQL
DISTINCT keyword. So the following
JPQL:  "SELECT DISTINCT b FROM Book b WHERE b.title = 'Gone Sailing'"
may result in the following SQL
SQL : "SELECT DISTINCT t0.id, t0.dueDate, t0.title, . . . FROM Book t0 WHERE
t0.title = 'Gone Sailing'"


This works fine for most queries, but when I was looking into OPENJPA-894 I
noticed a problem with some relationships and the JOIN FETCH clause. A JOIN
FETCH looking like this :
JPQL: "SELECT b from Book b JOIN FETCH b.subjects WHERE b.title = 'Gone
Sailing'"
Has two effects :
   1. It eagerly loads b.subjects
   2. It returns (potentially) multiple references to the same book. One
reference to Book(id=1) for every subject associated with Book(id=1).

The resulting SQL may look like this (in this case Book is MxM with Subject)
:
SQL "SELECT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid, t2.name FROM
LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid = t1.BOOKS_OID INNER
JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE t0.title = 'Gone
Fishing'"

Lets say that the Book with title = 'Gone Sailing' has two subjects :
Outdoors and Sportsman. In that case the SQL will return two rows that look
something like this :

OID    DUEDATE    TITLE          BOOKS_OID  OID  NAME
----   ---------- ---------      ---------- ---- -----------
3      2009-11-11 Gone Fishing   3          12   Outdoors
3      2009-11-11 Gone Fishing   3          13   Sportsman

The fix for OPENJPA-894 generates a result list with two references to the
same Book(id=3).

If you only wanted eager fetching of b.subjects, and didn't want duplicates
a good first guess would be to add the DISTINCT keyword (I'm finally getting
back to the subject)

JPQL : "SELECT DISTINCT b from Book b JOIN FETCH b.subjects WHERE b.title =
'Gone Sailing'"
SQL :  "SELECT DISTINCT t0.oid, t0.dueDate, t0.title, t1.BOOKS_OID, t2.oid,
t2.name FROM LIBBOOK t0 INNER JOIN LIBSUBJECT_LIBBOOK t1 ON t0.oid =
t1.BOOKS_OID INNER JOIN LIBSUBJECT t2 ON t1.SUBJECTS_OID = t2.oid WHERE
t0.title = 'Gone Fishing'"

We'll still get the same two rows because the SQL DISTINCT keyword is
applied to all permutations of the columns - not just the oid (and if it did
only apply to the oid we wouldn't be eagerly loading b.subjects anyway). As
a result I think we'll have to use a SetBackedList (or otherwise enforce
distinct results after getting rows from SQL) as our ResultList.

I have a patch that does this, and resolves the use case I described above
(it's in the TestLibService unit test), but I'm open to any other ideas for
ways to resolve the problem.

Thanks,
-mike

How should we handle the JPQL DISTINCT keyword

Reply via email to