Hi David,

Funny you mention this as I was working on it today. I have been using 
PostgreSQL's pg_trigram for years and had the interest to implement this in 4D. 
The code to create n-grams is pretty simple. Putting trigrams in a keyword 
indexed text field makes it super fast to find matches. The main problem seems 
to be finding too many matches. Scoring the results is a lot less efficient 
than finding the candidates. Still looking into it.

I'm still using the LCS code you published years ago. Thanks very much for 
that. If anything useful comes out of my exploration of trigrams, I'll try to 
do the same.

John DeSoi, Ph.D.



> On Jul 6, 2017, at 6:17 PM, David Adams via 4D_Tech <4d_tech@lists.4d.com> 
> wrote:
> 
> Next up the complexity chain is n-gram comparison (sometimes called q-gram
> comparison, for no clear reason.) N-gram is a confusing term now because
> historically it meant "strings of a certain length", like take a word and
> break it into 3-character strings. n=3. 3 is a good length, based on
> research. It's confusing now because Google's public n-gram data sets and
> tools are based on proximate words, not strings. Anyway, n-gram analyses is
> very powerful and proven tech...but I've failed to get great results in 4D.
> It could very well be me...I haven't had enough time/attention to ever
> really dive into this in recent years.

**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**********************************************************************

Reply via email to