While rewriting the wiki-editor, I noted that the snippet search terms
on resulting HTML based wiki-pages (and once identified, many other html
based items) had words concatenated in various cases when they had
intervening html-tags. It took me about five hours to track down the
culprit to the html_to_plaintext method, mostly because I had to figure
out the guts of the fts engine, the search page facilities, etc. Then it
was about 2 hours carefully tracing the logic and revising it in the
html_to_plaintext method to correct the problems.
I have fixed that code.
I've included an HTML sample that will reliably fail!
<h1>Fossil Search Guide</h1><div style="font-size:
14.9333px;"><ul><li>Sqlite FTS Search Patterns</li><ul><li><a
href="https://www.sqlite.org/fts3.html">FTS3 Guide</a></li><li><a
href="https://www.sqlite.org/fts3.html#fts4_options">FTS4
Guide</a></li><li><a href="https://www.sqlite.org/fts5.html">FTS5
Guide</a></li><li><a
href="https://www.sqlite.org/fts5.html#appendix_a">Comparison of FTS3/4
with FTS5</a></li></ul><li><a
href="https://www.sqlite.org/fts3.html#appendix_a">Search Application
Tips</a></li><ul><li><a
href="https://www.sqlite.org/fts3.html#section_3">Full-text Index
Queries</a></li><li>Near, Unary "-", OR, AND</li><ul><li><a
href="https://www.sqlite.org/fts3.html#section_3_2">Standard Query
Syntax</a></li><li><a
href="https://www.sqlite.org/fts3.html#section_3_1">Enhanced Query
Syntax</a></li></ul></ul><li><a
href="https://www.sqlite.org/fts3.html#tokenizer">How the tokenizers
work</a> (and why HTML doesn't currently search well)</li></ul></div>
Leading to a search result of:
Wiki: kb/fossil-search-guide
<http://ds.la/devops/wiki?name=kb/fossil-search-guide>
Fossil SearchGuideSqlite FTS Search PatternsFTS3 GuideFTS4 GuideFTS5
GuideComparison of FTS3/4 with FTS5 Search Application Tips Full-text
Index QueriesNear, Unary "-", OR, ANDStandard Query SyntaxEnhanced Query
SyntaxHow the tokenizers work (and why HTML ...
Which when rendered reads:
Fossil Search Guide
* Sqlite FTS Search Patterns
o FTS3 Guide <https://www.sqlite.org/fts3.html>
o FTS4 Guide <https://www.sqlite.org/fts3.html#fts4_options>
o FTS5 Guide <https://www.sqlite.org/fts5.html>
o Comparison of FTS3/4 with FTS5
<https://www.sqlite.org/fts5.html#appendix_a>
* Search Application Tips <https://www.sqlite.org/fts3.html#appendix_a>
o Full-text Index Queries <https://www.sqlite.org/fts3.html#section_3>
o Near, Unary "-", OR, AND
+ Standard Query Syntax
<https://www.sqlite.org/fts3.html#section_3_2>
+ Enhanced Query Syntax
<https://www.sqlite.org/fts3.html#section_3_1>
* How the tokenizers work
<https://www.sqlite.org/fts3.html#tokenizer> (and why HTML doesn't
currently search well)
David Simmons (dsim)
_______________________________________________
fossil-dev mailing list
fossil-dev@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/fossil-dev