While rewriting the wiki-editor, I noted that the snippet search terms on resulting HTML based wiki-pages (and once identified, many other html based items) had words concatenated in various cases when they had intervening html-tags. It took me about five hours to track down the culprit to the html_to_plaintext method, mostly because I had to figure out the guts of the fts engine, the search page facilities, etc. Then it was about 2 hours carefully tracing the logic and revising it in the html_to_plaintext method to correct the problems.

I have fixed that code.

I've included an HTML sample that will reliably fail!

<h1>Fossil Search Guide</h1><div style="font-size: 14.9333px;"><ul><li>Sqlite FTS Search Patterns</li><ul><li><a href="https://www.sqlite.org/fts3.html";>FTS3 Guide</a></li><li><a href="https://www.sqlite.org/fts3.html#fts4_options";>FTS4 Guide</a></li><li><a href="https://www.sqlite.org/fts5.html";>FTS5 Guide</a></li><li><a href="https://www.sqlite.org/fts5.html#appendix_a";>Comparison of FTS3/4 with FTS5</a></li></ul><li><a href="https://www.sqlite.org/fts3.html#appendix_a";>Search Application Tips</a></li><ul><li><a href="https://www.sqlite.org/fts3.html#section_3";>Full-text Index Queries</a></li><li>Near, Unary "-", OR, AND</li><ul><li><a href="https://www.sqlite.org/fts3.html#section_3_2";>Standard Query Syntax</a></li><li><a href="https://www.sqlite.org/fts3.html#section_3_1";>Enhanced Query Syntax</a></li></ul></ul><li><a href="https://www.sqlite.org/fts3.html#tokenizer";>How the tokenizers work</a>&nbsp;(and why HTML doesn't currently search well)</li></ul></div>

Leading to a search result of:

Wiki: kb/fossil-search-guide <http://ds.la/devops/wiki?name=kb/fossil-search-guide> Fossil SearchGuideSqlite FTS Search PatternsFTS3 GuideFTS4 GuideFTS5 GuideComparison of FTS3/4 with FTS5 Search Application Tips Full-text Index QueriesNear, Unary "-", OR, ANDStandard Query SyntaxEnhanced Query SyntaxHow the tokenizers work (and why HTML ...

Which when rendered reads:


 Fossil Search Guide

 * Sqlite FTS Search Patterns
     o FTS3 Guide <https://www.sqlite.org/fts3.html>
     o FTS4 Guide <https://www.sqlite.org/fts3.html#fts4_options>
     o FTS5 Guide <https://www.sqlite.org/fts5.html>
     o Comparison of FTS3/4 with FTS5
       <https://www.sqlite.org/fts5.html#appendix_a>
 * Search Application Tips <https://www.sqlite.org/fts3.html#appendix_a>
     o Full-text Index Queries <https://www.sqlite.org/fts3.html#section_3>
     o Near, Unary "-", OR, AND
         + Standard Query Syntax
           <https://www.sqlite.org/fts3.html#section_3_2>
         + Enhanced Query Syntax
           <https://www.sqlite.org/fts3.html#section_3_1>
 * How the tokenizers work
   <https://www.sqlite.org/fts3.html#tokenizer> (and why HTML doesn't
   currently search well)

David Simmons (dsim)
_______________________________________________
fossil-dev mailing list
fossil-dev@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/fossil-dev

Reply via email to