Re: cvs commit: cocoon-2.1 status.xml

Joerg Heinicke Tue, 09 Mar 2004 03:18:22 -0800

On 09.03.2004 02:39, Vadim Gritsenko wrote:

public void characters(char[] ch, int start, int length) { if (ch.length > 0 && start >= 0 && length > 1) { - String text = new String(ch, start, length); if (elementStack.size() > 0) { IndexHelperField tos = (IndexHelperField) elementStack.peek(); - tos.appendText(text); + tos.appendText(ch, start, length); } - bodyText.append(text); + bodyText.append(' '); + bodyText.append(ch, start, length); } }

What will happen when "keyword" text is streamed as two characters events, "key" and "word"? I think it will become "key word", and indexing will break.

IIUC, idea was to add a space in between tags, i.e. so <p>some</p><p>text</p> is not indexed as "sometext". If that's correct, then better fix would be to add space only if boolean flag had_start_or_end_element_in_between_char_events set.
Joerg?

Your mail was neither ignored nor accidently deleted - I just didn't know what really to write, but marked it as important in nice red color in Mozilla :)

Yes, I see your objection - and asked for them already in the bug http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25934 ;)

So what are the practical use cases this might occure? Maybe it's only a theoretical problem depending on the "thing" the index is created from? On which SAX stream the LuceneIndexHandler operates?

I also don't get your implications for "had_start_or_end_element_in_between_char_events". But I had a look on the endElement(). It gets the elements from a stack and already tests for text: if (text != null && text.length() > 0) { Would it make sense to add the space in endElement, if the element contains text, i.e. the above is true?

Joerg

Re: cvs commit: cocoon-2.1 status.xml

Reply via email to