Hi Donal,
We released SIREn [1], a plugin for Lucene that allows indexing and
querying of semi-structured data, a few days ago. Your use case seems to
match perfectly what SIREn can do.
SIREn enables the indexing of semi-structured data into a Lucene field,
and offers additional query components to build programmatically
semi-structured queries. SIREn is currently indexing tabular data, i.e.
data composed of rows and columns.
For example, for your use case, you can create a SIREn's field that will
contain the following table
Course.name Attendance.mandatory
----------------------------------
cooking N
art Y
Course.name is the first column of the SIREn's table,
Attendance.mandatory the second column. SIREn does not have limitation
on the number of columns, which means that you can index additional
information than the course.name or attendance.mandatory, for example
the course.location or professor.name. Each row (or SIREn's tuple) is
one of your database entry. For example, the first row is {cooking, N}.
The Student.name is index into a normal Lucene's field in order to be
able to retrieve it. To summarize, your Lucene's document schema will
look like:
Doc {
- name: Bob
- content: {cooking, N}, {art, Y}
}
The 'content' field is created using SIREn, and index two tuples:
{cooking, N} and {art, Y}.
Then, you can retrieve, using SIREn's query components, all documents
that matches certain tuples, such as {cooking, Y}. In this example, this
will return nothing since there is no tuples containing {cooking, Y}.
You can have a look at the IMDB indexing and querying example [2]. It
shows how to index and query tabular data of this kind. If you need some
help, feel free to ask your questions in our mailing list.
[1] http://siren.sindice.com
[2]
https://dev.deri.ie/confluence/display/SIREn/Indexing+and+Searching+Tabular+Data
Best Regards,
--
Renaud Delbru
Donal Murtagh wrote:
Hi,
I'm trying to use Lucene to query a domain that has the following structure
Student 1-------* Attendance *---------1 Course
The data in the domain is summarised below
Course.name Attendance.mandatory Student.name
-------------------------------------------------
cooking N Bob
art Y Bob
If I execute the query "+courseName:cooking AND +mandatory:Y"
it
returns Bob, because Bob is attending the cooking course, and Bob is
also attending a mandatory course. However, what I *really* want to
query for is "students attending a mandatory cooking course", which in
this case would return nobody. Is it possible to formulate this as a
Lucene query?
For the sake of completeness, the domain classes
themselves are shown below. These classes are Grails domain classes,
but I'm using the standard Compass annotations and Lucene query syntax.
Thanks!
- Don
@Searchable
class Student {
@SearchableProperty(accessor = 'property')
String name
static hasMany = [attendances: Attendance]
@SearchableId(accessor = 'property')
Long id
@SearchableComponent
Set<Attendance> getAttendances() {
return attendances
}
}
@Searchable(root = false)
class Attendance {
static belongsTo = [student: Student, course: Course]
@SearchableProperty(accessor = 'property')
String mandatory = "Y"
@SearchableId(accessor = 'property')
Long id
@SearchableComponent
Course getCourse() {
return course
}
}
@Searchable(root = false)
class Course {
@SearchableProperty(accessor = 'property', name = "courseName")
String name
@SearchableId(accessor = 'property')
Long id
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]