The second parameter passed to SpanCollector.collectLeaf() is the position, rather than an index of any kind, which I think is going to mess things up for you. But other than that, you've got the right idea. :-)
Alan Woodward www.flax.co.uk On 3 Nov 2015, at 00:26, Allison, Timothy B. wrote: > All, > > I'm trying to find all spans in a given String via stored offsets in Lucene > 5.3.1. I wanted to use the Highlighter with a NullFragmenter, but that is > highlighting only the matching terms, not the full Spans (related to > LUCENE-6796?). > > My Current code iterates through the spans, stores the span positions in one > array and gathers the character offsets via a SpanCollector in a Map<Integer, > OffsetAttribute>. Is there a simpler way? > > Something like this: > > String s = "the quick brown fox jumped over the lazy dog"; > String field = "f"; > Analyzer analyzer = new StandardAnalyzer(); > > SpanQuery spanQuery = new SpanNearQuery( > new SpanQuery[] { > new SpanTermQuery(new Term(field, "fox")), > new SpanTermQuery(new Term(field, "quick")) > }, > 3, > false > ); > > > MemoryIndex index = new MemoryIndex(true); > > > index.addField(field, s, analyzer); > index.freeze(); > > IndexSearcher searcher = index.createSearcher(); > IndexReader reader = searcher.getIndexReader(); > spanQuery = (SpanQuery) spanQuery.rewrite(reader); > SpanWeight weight = (SpanWeight) searcher.createWeight(spanQuery, false); > Spans spans = weight.getSpans(reader.leaves().get(0), > SpanWeight.Postings.OFFSETS); > > if (spans == null) { > //do something with full string > return; > } > > OffsetSpanCollector offsetSpanCollector = new OffsetSpanCollector(); > List<OffsetAttribute> spanPositions = new ArrayList<>(); > while (spans.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) { > while (spans.nextStartPosition() != Spans.NO_MORE_POSITIONS) { > OffsetAttributeImpl offsetAttribute = new OffsetAttributeImpl(); > offsetAttribute.setOffset(spans.startPosition(), > spans.endPosition()-1); > spanPositions.add(offsetAttribute); > spans.collect(offsetSpanCollector); > } > } > Map<Integer, OffsetAttribute> charOffsets = offsetSpanCollector.getOffsets(); > //now iterate through the list of spanPositions and grab the character > offsets for the start and end tokens of each > //span from the charOffsets > ... > > > > > private class OffsetSpanCollector implements SpanCollector { > Map<Integer, Offset> charOffsets = new HashMap<>(); > > @Override > public void collectLeaf(PostingsEnum postingsEnum, int i, Term term) > throws IOException { > > OffsetAttributeImpl offsetAttribute = new OffsetAttributeImpl(); > offsetAttribute.setOffset(postingsEnum.startOffset(), > postingsEnum.endOffset()); > > charOffsets.put(i, offsetAttribute); > } > > @Override > public void reset() { > > //don't think I need to do anything with this? > } > > public Map<Integer, OffsetAttribute> getOffsets() { > return charOffsets; > } > } > >