Michael McCandless wrote:
This stuff is confusing! I think your numbers are not right. Let's try reformatting with CHAR=POS.

Here's your example without the +1:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
a=0 b=1 c=2 d=3 =4 t=5 h=6 e=7 c=8 r=9 u=10 n=11 c=12 h=13 =14 m=15 a=16 n=17

  abcd 0-4
crunch 8-14
   man 15-18

This is not how Lucene works today.  Lucene adds the +1 ("virtual
space character"):

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
a=0 b=1 c=2 d=3 =4 t=5 h=6 e=7 =8 c=9 r=10 u=11 n=12 c=13 h=14 =15 m=16 a=17 n=18

  abcd 0-4
crunch 9-15
   man 16-19

I think?

Man, I'm sorry. Just reran my stuff and it didn't jive with my earlier results. Keep the +1 off it looks. Don't know what happened...I have java code and lucene calculating for me :)

At least that jives with earlier reports of people saying that have to insert that space to get things highlighted. Here are the results i get now:

Old:
a=0 b=1 c=2 d=3 =4 t=5 h=6 e=7 c=8 r=9 u=10 n=11 c=12 h=13 =14 m=15 a=16 n=17
term:abcd s:0 e:4
term:crunch s:5 e:11
term:man s:12 e:15

New Without +1:

a=0 b=1 c=2 d=3 =4 t=5 h=6 e=7 c=8 r=9 u=10 n=11 c=12 h=13 =14 m=15 a=16 n=17
term:abcd s:0 e:4
term:crunch s:8 e:14
term:man s:15 e:18

New With +1:

a=0 b=1 c=2 d=3 =4 t=5 h=6 e=7 c=8 r=9 u=10 n=11 c=12 h=13 =14 m=15 a=16 n=17
term:abcd s:0 e:4
term:crunch s:9 e:15
term:man s:16 e:19


We are on the same page and I'm sorry for taking you down that path - except now you might be more sure it doesn't belong ;)

I see some of my initial and continued confusion was caused by that char tokenizer bug...your original tests now look right (second abcd starting at 8 rather than 7).

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to