On 07/16/2013 01:43 AM, Marvin Humphrey wrote:
On Mon, Jul 15, 2013 at 9:23 AM, Moritz Lenz <[email protected]> wrote:
Some details:
An example search can be found here:
http://irclog.perlgeek.de/perl6/search/?nick=timtoady&q=threads
The backend code is here:
https://github.com/moritz/ilbot/blob/master/lib/Ilbot/Backend/Search.pm
For the indexing I lump together all subsequent lines by the same nick into
one document, and store the database IDs as a comma-separate value in a
second field, and the day in a third field. Each IRC channel has a separate
index.
Then when displaying the search results, I retrieve all lines for that day
and channel from the database or cache (which is fast enough, and much
simpler than building complicated queries), and filter out the search
results, plus a few lines before and afterward for context.
So far I'm very happy with theses tradeoffs, and like the results.
It's a very nice interface. Congratulations on a successful design. :)
Thanks. (TBH the design wasn't by me. I provided a useful but ugly
service, and a user was sufficiently annoyed to provide a better design;
this approach has worked several times for me in the open source
community :-)
I wonder whether you might consider making the "line" field stored and
highlightable.
my $type = Lucy::Plan::FullTextType->new(
analyzer => $polyanalyzer,
- stored => 0,
+ highlightable => 1,
);
I see that you've emboldened the relevant line, but you could go further and
use the Highlighter to emphasize the keywords that were searched for.
http://lucy.apache.org/docs/perl/Lucy/Docs/Tutorial/Highlighter.html
http://lucy.apache.org/docs/perl/Lucy/Highlight/Highlighter.html
By default, the Highlighter surrounds keywords with `<strong>` tags, but using
set_pre_tag() and set_post_tag() you can make it use a span with CSS, <blink>
tags, or whatever.
Thanks for the comment.
I'm well aware of the highlighting feature. The reason I don't use it is
that (although it's not obvious from the example search I've linked to),
there is a big amount of processing going on (escaping HTML,
automatically turning URLs into links, inserting zero-width, breaking
spaces into long words to prevent horizontal scrolling, ...), and I
couldn't quite figure out how to mix my own processing with the
hilighting from Lucy.
For my purposes it would be much nicer to obtain indexes into the string
where the search term was found, (or alternatively, let me set separate
callbacks for both the context and the search results) so that I could
do my own processing with that information.
(I guess I could generate a unique string that doesn't yet appear in the
string, set it as pre/post tag, and then split on that, but that feels
very backwards).
Cheers,
Moritz