[
https://issues.apache.org/jira/browse/LUCENE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630286#action_12630286
]
Karl Wettin commented on LUCENE-1320:
-------------------------------------
Cool, thanks!
The only thing I could see is that you managed to remove a couple of <pre>
tags.
I'll also leave this out of the commit:
{code:java}
Index:
contrib/analyzers/src/java/org/apache/lucene/analysis/compound/hyphenation/PatternParser.java
===================================================================
---
contrib/analyzers/src/java/org/apache/lucene/analysis/compound/hyphenation/PatternParser.java
(revision 694390)
+++
contrib/analyzers/src/java/org/apache/lucene/analysis/compound/hyphenation/PatternParser.java
(arbetskopia)
@@ -267,7 +267,7 @@
// EntityResolver methods
//
public InputSource resolveEntity(String publicId, String systemId)
- throws SAXException, IOException {
+ throws SAXException {
return HyphenationDTDGenerator.generateDTD();
}
{code}
> ShingleMatrixFilter, a three dimensional permutating shingle filter
> -------------------------------------------------------------------
>
> Key: LUCENE-1320
> URL: https://issues.apache.org/jira/browse/LUCENE-1320
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/analyzers
> Affects Versions: 2.3.2
> Reporter: Karl Wettin
> Assignee: Karl Wettin
> Priority: Blocker
> Fix For: 2.4
>
> Attachments: LUCENE-1320.patch, LUCENE-1320.txt, LUCENE-1320.txt,
> LUCENE-1320.txt
>
>
> Backed by a column focused matrix that creates all permutations of shingle
> tokens in three dimensions. I.e. it handles multi token synonyms.
> Could for instance in some cases be used to replaces 0-slop phrase queries
> with something speedier.
> {code:java}
> Token[][][]{
> {{hello}, {greetings, and, salutations}},
> {{world}, {earth}, {tellus}}
> }
> {code}
> passes the following test with 2-3 grams:
> {code:java}
> assertNext(ts, "hello_world");
> assertNext(ts, "greetings_and");
> assertNext(ts, "greetings_and_salutations");
> assertNext(ts, "and_salutations");
> assertNext(ts, "and_salutations_world");
> assertNext(ts, "salutations_world");
> assertNext(ts, "hello_earth");
> assertNext(ts, "and_salutations_earth");
> assertNext(ts, "salutations_earth");
> assertNext(ts, "hello_tellus");
> assertNext(ts, "and_salutations_tellus");
> assertNext(ts, "salutations_tellus");
> {code}
> Contains more and less complex tests that demonstrate offsets, posincr,
> payload boosts calculation and construction of a matrix from a token stream.
> The matrix attempts to hog as little memory as possible by seeking no more
> than maximumShingleSize columns forward in the stream and clearing up unused
> resources (columns and unique token sets). Can still be optimized quite a bit
> though.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]