Github user NightOwl888 commented on the issue:

    https://github.com/apache/lucenenet/pull/182
  
    Ok, this is now down to 23 failing tests.
    
    The 17 failing tests in Synonym are still really no closer to being solved. 
I went over the SynonymMap and SynonymFilter classes line by line 3x. Wherever 
the problem is, it is hidden well.
    
    After spending a whole day stepping through code, I finally found a clue - 
all of the failing tests are failing when the expected synonym input has a 
space in it. For example, TestMatching doesn't fail until [this 
line](https://github.com/NightOwl888/lucenenet/blob/analysis-bugz/src/Lucene.Net.Tests.Analysis.Common/Analysis/Synonym/TestSynonymMapFilter.cs#L875)
 when the first expected input is "z x c v". It is unclear how that is supposed 
to happen, though since the tokenizer makes "z" a separate token which causes 
the logic to exit out at that point without comparing "z x", "z x c", and "z x 
c v". I went online hunting for a clue, but only found [this question on 
SO](http://stackoverflow.com/questions/17283100/lucene-synonym-filter-behavior) 
in which the poster is just as confused about it as I am.
    
    I also tried again at the 5 failing tests in the Compound namespace. I went 
over everything line by line. Then I tried stepping through the code. However, 
I don't have a clue what the code is supposed to do, only what the expected 
output is. In [this 
test](https://github.com/NightOwl888/lucenenet/blob/analysis-bugz/src/Lucene.Net.Tests.Analysis.Common/Analysis/Compound/TestCompoundWordTokenFilter.cs#L84),
 the first output succeeds. The second output is expected to be "ba". The first 
token [comes back as 
"b"](https://github.com/NightOwl888/lucenenet/blob/analysis-bugz/src/Lucene.Net.Analysis.Common/Analysis/Compound/hyphenation/HyphenationTree.cs#L414)
 (is that right?), it then looks up 
[TernaryTree.Find()](https://github.com/NightOwl888/lucenenet/blob/analysis-bugz/src/Lucene.Net.Analysis.Common/Analysis/Compound/hyphenation/HyphenationTree.cs#L415)
 and it maps to "a" (is that right?), it then puts it as the second letter of 
the word array (that seems right..?). The next letter i
 s "a", it looks it up and comes back as "z"(is that right?) it adds it as the 
3rd element in the array (now that can't be right, can it?), the next letters 
it looks up are "r" and "j".  The documentation is scarce. I really don't see 
any hope of solving this without running side-by-side with the Java Lucene to 
see where the paths diverge. Although, the most likely cause has something to 
do with replacing the SAX parser with XmlReader and the HyphenationTree isn't 
being populated right. But, it is difficult to know what "right" is, since 
there are no tests on the HyphenationTree itself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to