sonatype-lift[bot] commented on a change in pull request #427:
URL: https://github.com/apache/lucene/pull/427#discussion_r742771217
##########
File path:
lucene/queries/src/java/org/apache/lucene/queries/intervals/Intervals.java
##########
@@ -429,4 +444,300 @@ public static IntervalsSource after(IntervalsSource
source, IntervalsSource refe
source,
Intervals.extend(new OffsetIntervalsSource(reference, false), 0,
Integer.MAX_VALUE));
}
+
+ /**
+ * Returns intervals that correspond to tokens from a {@link TokenStream}
returned for {@code
+ * text} by applying the provided {@link Analyzer} as if {@code text} was
the content of the given
+ * {@code field}. The intervals can be ordered or unordered and can have
optional gaps inside.
+ *
+ * @param text The text to analyze.
+ * @param analyzer The {@link Analyzer} to use to acquire a {@link
TokenStream} which is then
+ * converted into intervals.
+ * @param field The field {@code text} should be parsed as.
+ * @param maxGaps Maximum number of allowed gaps between sub-intervals
resulting from tokens.
+ * @param ordered Whether sub-intervals should enforce token ordering or not.
+ * @return Returns an {@link IntervalsSource} that matches tokens acquired
from analysis of {@code
+ * text}. Possibly an empty interval source, never {@code null}.
+ * @throws IOException If an I/O exception occurs.
+ */
+ public static IntervalsSource analyzedText(
+ String text, Analyzer analyzer, String field, int maxGaps, boolean
ordered)
+ throws IOException {
+ try (TokenStream ts = analyzer.tokenStream(field, text)) {
+ return analyzedText(ts, maxGaps, ordered);
+ }
+ }
+
+ /**
+ * Returns intervals that correspond to tokens from the provided {@link
CachingTokenFilter}. This
+ * is a low-level counterpart to {@link #analyzedText(String, Analyzer,
String, int, boolean)}.
+ * The intervals can be ordered or unordered and can have optional gaps
inside.
+ *
+ * @param tokenStream The token stream to produce intervals for. The token
stream may be fully or
+ * partially consumed after returning from this method.
+ * @param maxGaps Maximum number of allowed gaps between sub-intervals
resulting from tokens.
+ * @param ordered Whether sub-intervals should enforce token ordering or not.
+ * @return Returns an {@link IntervalsSource} that matches tokens acquired
from analysis of {@code
+ * text}. Possibly an empty interval source, never {@code null}.
+ * @throws IOException If an I/O exception occurs.
+ */
+ public static IntervalsSource analyzedText(TokenStream tokenStream, int
maxGaps, boolean ordered)
+ throws IOException {
+ CachingTokenFilter stream =
+ tokenStream instanceof CachingTokenFilter
+ ? (CachingTokenFilter) tokenStream
+ : new CachingTokenFilter(tokenStream);
+
+ TermToBytesRefAttribute termAtt =
stream.getAttribute(TermToBytesRefAttribute.class);
+ PositionIncrementAttribute posIncAtt =
stream.addAttribute(PositionIncrementAttribute.class);
+ PositionLengthAttribute posLenAtt =
stream.addAttribute(PositionLengthAttribute.class);
+
+ if (termAtt == null) {
+ return NO_INTERVALS;
+ }
+
+ // Phase 1: read through the stream and assess the situation:
+ // counting the number of tokens/positions and marking if we have any
synonyms.
+
+ int numTokens = 0;
+ boolean hasSynonyms = false;
+ boolean isGraph = false;
+
+ stream.reset();
+ while (stream.incrementToken()) {
Review comment:
*NULL_DEREFERENCE:* object `stream.iterator` last assigned on line 489
could be null and is dereferenced by call to `incrementToken()` at line 507.
(at-me [in a reply](https://help.sonatype.com/lift/talking-to-lift) with
`help` or `ignore`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]