Michael McCandless wrote:
This stuff is confusing! I think your numbers are not right. Let's
try reformatting with CHAR=POS.
Here's your example without the +1:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
a=0 b=1 c=2 d=3 =4 t=5 h=6 e=7 c=8 r=9 u=10 n=11 c=12 h=13 =14 m=15
a=16 n=17
abcd 0-4
crunch 8-14
man 15-18
This is not how Lucene works today. Lucene adds the +1 ("virtual
space character"):
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
a=0 b=1 c=2 d=3 =4 t=5 h=6 e=7 =8 c=9 r=10 u=11 n=12 c=13 h=14 =15
m=16 a=17 n=18
abcd 0-4
crunch 9-15
man 16-19
I think?
Man, I'm sorry. Just reran my stuff and it didn't jive with my earlier
results. Keep the +1 off it looks. Don't know what happened...I have
java code and lucene calculating for me :)
At least that jives with earlier reports of people saying that have to
insert that space to get things highlighted. Here are the results i get now:
Old:
a=0 b=1 c=2 d=3 =4 t=5 h=6 e=7 c=8 r=9 u=10 n=11 c=12 h=13 =14 m=15
a=16 n=17
term:abcd s:0 e:4
term:crunch s:5 e:11
term:man s:12 e:15
New Without +1:
a=0 b=1 c=2 d=3 =4 t=5 h=6 e=7 c=8 r=9 u=10 n=11 c=12 h=13 =14 m=15
a=16 n=17
term:abcd s:0 e:4
term:crunch s:8 e:14
term:man s:15 e:18
New With +1:
a=0 b=1 c=2 d=3 =4 t=5 h=6 e=7 c=8 r=9 u=10 n=11 c=12 h=13 =14 m=15
a=16 n=17
term:abcd s:0 e:4
term:crunch s:9 e:15
term:man s:16 e:19
We are on the same page and I'm sorry for taking you down that path -
except now you might be more sure it doesn't belong ;)
I see some of my initial and continued confusion was caused by that char
tokenizer bug...your original tests now look right (second abcd starting
at 8 rather than 7).
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]