Essentially this would just comprise triples encoded in a document as fields.
So for example we would have subject predicate and object relationships as document fields. Subject and predicates would be Tokens and then the object field would be indexed.
For example a triple (document) would be:
http://jakarta.apache.org -> title -> "A great Java developer's website"
This would be just one document in the index.
This would have a lot of advantages most importantly speed and the reliability of Lucene and the ability to run a full text query on objects.
For example we could query on "Java" and get back "http://jakarta.apache.org"
The major downside I could see is that this would mean that we would be indexing a LOT of small documents with a LOT of index updates.
Can anyone see any problems here? This database will eventually grow to around 2TB in the next month so performance issues are non-trivial.
Most people have deployed Lucene with large document sizes and the fact that most people are citing document COUNT makes me nervous.
Kevin
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
