On 07/16/2013 01:43 AM, Marvin Humphrey wrote:
On Mon, Jul 15, 2013 at 9:23 AM, Moritz Lenz <[email protected]> wrote:
Some details:

An example search can be found here:
http://irclog.perlgeek.de/perl6/search/?nick=timtoady&q=threads
The backend code is here:
https://github.com/moritz/ilbot/blob/master/lib/Ilbot/Backend/Search.pm

For the indexing I lump together all subsequent lines by the same nick into
one document, and store the database IDs as a comma-separate value in a
second field, and the day in a third field. Each IRC channel has a separate
index.

Then when displaying the search results, I retrieve all lines for that day
and channel from the database or cache (which is fast enough, and much
simpler than building complicated queries), and filter out the search
results, plus a few lines before and afterward for context.

So far I'm very happy with theses tradeoffs, and like the results.

It's a very nice interface.  Congratulations on a successful design. :)

Thanks. (TBH the design wasn't by me. I provided a useful but ugly service, and a user was sufficiently annoyed to provide a better design; this approach has worked several times for me in the open source community :-)

I wonder whether you might consider making the "line" field stored and
highlightable.

       my $type = Lucy::Plan::FullTextType->new(
           analyzer => $polyanalyzer,
-         stored => 0,
+         highlightable => 1,
       );

I see that you've emboldened the relevant line, but you could go further and
use the Highlighter to emphasize the keywords that were searched for.

     http://lucy.apache.org/docs/perl/Lucy/Docs/Tutorial/Highlighter.html
     http://lucy.apache.org/docs/perl/Lucy/Highlight/Highlighter.html

By default, the Highlighter surrounds keywords with `<strong>` tags, but using
set_pre_tag() and set_post_tag() you can make it use a span with CSS, <blink>
tags, or whatever.

Thanks for the comment.
I'm well aware of the highlighting feature. The reason I don't use it is that (although it's not obvious from the example search I've linked to), there is a big amount of processing going on (escaping HTML, automatically turning URLs into links, inserting zero-width, breaking spaces into long words to prevent horizontal scrolling, ...), and I couldn't quite figure out how to mix my own processing with the hilighting from Lucy.

For my purposes it would be much nicer to obtain indexes into the string where the search term was found, (or alternatively, let me set separate callbacks for both the context and the search results) so that I could do my own processing with that information.

(I guess I could generate a unique string that doesn't yet appear in the string, set it as pre/post tag, and then split on that, but that feels very backwards).

Cheers,
Moritz

Reply via email to