I’m working with Lucene 5.1 to try to make use of the relational structure of
the block join index and query mechanisms. I’m querying with the following
code:
IndexReader reader = DirectoryReader.open(index);
ToParentBlockJoinIndexSearcher searcher = new
ToParentBlockJoinIndexSearcher(reader);
ToParentBlockJoinCollector collector = new
ToParentBlockJoinCollector(Sort.RELEVANCE, 2, true, true);
BitDocIdSetFilter codingScheme = new BitDocIdSetCachingWrapperFilter(
new QueryWrapperFilter(new QueryParser("codingSchemeName",
new StandardAnalyzer(new CharArraySet( 0,
true))).parse(scheme.getCodingSchemeName())));
Query query = new QueryParser(null, new StandardAnalyzer(new CharArraySet( 0,
true))).createBooleanQuery("propertyValue", term.getTerm(), Occur.MUST);
ToParentBlockJoinQuery termJoinQuery = new ToParentBlockJoinQuery(
query,
codingScheme,
ScoreMode.Avg);
searcher.search(termJoinQuery, collector);
To try to get parent values, but it fails on the final line with the following
stack trace:
Exception in thread "main" java.lang.IllegalStateException: child query must
only match non-parent docs, but parent docID=2147483647 matched
childScorer=class org.apache.lucene.search.TermScorer
at
org.apache.lucene.search.join.ToParentBlockJoinQuery$BlockJoinScorer.nextDoc(ToParentBlockJoinQuery.java:330)
at
org.apache.lucene.search.join.ToParentBlockJoinIndexSearcher.search(ToParentBlockJoinIndexSearcher.java:63)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:428)
at
org.lexevs.lucene.prototype.LuceneQueryTrial.luceneToParentJoinQuery(LuceneQueryTrial.java:78)
at org.lexevs.lucene.prototype.LuceneQueryTrial.main(LuceneQueryTrial.java:327)
I build indexes up to about 36Gb using a code similar to the following:
List<Document> list = new ArrayList<Document>();
//need a static
int staticCount = count;
ParentDocObject parent = builder.generateParentDoc(cs.getCodingSchemeName(),
cs.getVersion(), cs.getURI(), "description");
if (cs.codingSchemeName.equals(CodingScheme.THESSCHEME.codingSchemeName)) {
//One per coding Scheme
int numberOfProperties = 12;
if(!thesExactMatchDone){
ChildDocObject child1 =
builder.generateChildDocWithSalt(parent,SearchTerms.BLOOD.getTerm());
Document doc1 = builder.mapToDocumentExactMatch(child1);
list.add(doc1);
count++;
numberOfProperties--;
ChildDocObject child =
builder.generateChildDocWithSalt(parent,SearchTerms.CHAR.term);
Document doc = builder.mapToDocumentExactMatch(child);
count++;
list.add(doc);
numberOfProperties--;
thesExactMatchDone = true;
}
while (numberOfProperties > 0) {
if(count % 547 == 0){
ChildDocObject child = builder.generateChildDocWithSalt(parent,
builder.randomTextGenerator(
builder.randomNumberGenerator(),SearchTerms.BLOOD.getTerm()));
Document doc = builder.mapToDocument(child);
list.add(doc);
count++;numberOfProperties--;
}else if(count % 233 == 0){
ChildDocObject child = builder.generateChildDocWithSalt(parent,
builder.randomTextGenerator(
builder.randomNumberGenerator(),SearchTerms.CHAR.getTerm()));
Document doc = builder.mapToDocument(child);
list.add(doc);
count++;numberOfProperties--;
}else if(count % 71 == 0){
ChildDocObject child = builder.generateChildDocWithSalt(parent,
builder.randomTextGenerator(
builder.randomNumberGenerator(),SearchTerms.ARTICLE.getTerm()));
Document doc = builder.mapToDocument(child);
list.add(doc);
count++;numberOfProperties--;
}else if(count % 2237 == 0){
ChildDocObject child = builder.generateChildDocWithSalt(parent,
builder.randomTextGenerator(
builder.randomNumberGenerator(),SearchTerms.LUNG_CANCER.getTerm()));
Document doc = builder.mapToDocument(child);
list.add(doc);
count++;numberOfProperties--;
}else if(count % 5077 == 0){
ChildDocObject child = builder.generateChildDocWithSalt(parent,
builder.randomTextGenerator(
builder.randomNumberGenerator(),SearchTerms.LIVER_CARCINOMA.getTerm()));
Document doc = builder.mapToDocument(child);
list.add(doc);
count++;numberOfProperties--;
}else if(count % 2371 == 0){
ChildDocObject child = builder.generateChildDocWithSalt(parent,
builder.randomTextGeneratorStartsWith(
builder.randomNumberGenerator(),SearchTerms.BLOOD.getTerm()));
Document doc = builder.mapToDocumentExactMatch(child);
list.add(doc);
count++;numberOfProperties--;
}else if(count % 79 == 0){
ChildDocObject child = builder.generateChildDocWithSalt(parent,
builder.randomTextGeneratorStartsWith(
builder.randomNumberGenerator(),SearchTerms.ARTICLE.getTerm()));
Document doc = builder.mapToDocumentExactMatch(child);
list.add(doc);
count++;numberOfProperties--;
}else if(count % 3581 == 0){
ChildDocObject child = builder.generateChildDocWithSalt(parent,
builder.randomTextGeneratorStartsWith(
builder.randomNumberGenerator(),SearchTerms.LUNG_CANCER.getTerm()));
Document doc = builder.mapToDocumentExactMatch(child);
list.add(doc);
count++;numberOfProperties--;
}else if(count % 23 == 0){
ChildDocObject child = builder.generateChildDocWithSalt(parent,
builder.randomTextGeneratorStartsWith(
builder.randomNumberGenerator(),SearchTerms.CHAR.getTerm()));
Document doc = builder.mapToDocumentExactMatch(child);
list.add(doc);
count++;numberOfProperties--;
} else {
ChildDocObject child = builder.generateChildDoc(parent);
Document doc = builder.mapToDocument(child);
list.add(doc);
count++;
numberOfProperties--;
}
}
}
Document par = builder.mapToDocument(parent);
list.add(par);
writer.addDocuments(list);
}
Which works pretty well until I scale it up using several instances of this.
When the nextChildDoc document retrieved gets to id 5874902 the line in
ToParentBlockJoinQuery
parentDoc = parentBits.nextSetBit(nextChildDoc);
Gives the value 2147483647 to the parentDoc, which is not a document id in my
index if I understand lucene and Luke correctly since my index has only
42716877 documents.
Can someone shed some light on this exception?
Thanks,
Scott Bauer