Hi John -

Definite yes vote on pushing tf-idf to factor/extra. A boon for anybody
working
on natural language processing with Factor. I found the code easy to extend
(modified it to work with my nosql couchdb-backed project), & a good
stimulus for
learning about vector-based representations of text & all that jazz (cf
google's
word2vec). Factor is a great education in neat topics of modern
programming,
happily open to instructive deconstruction.

Nice work; thanks a lot,
~cw

On Wed, Jul 8, 2015 at 9:11 PM, John Benediktsson <mrj...@gmail.com> wrote:

> Hi cw,
>
> Thanks for letting me know, I forgot to update it.  Just now pushed a fix
> if you want to update to latest re-factor:
>
>
> https://github.com/mrjbq7/re-factor/commit/d3a33c1dde3574db700392701a273c9ab80b7273
>
> Maybe tf-idf would be a good candidate vocab to move to factor/extra.
>
> Best,
> John.
>
>
>
> On Wed, Jul 8, 2015 at 9:00 PM, CW Alston <cwalsto...@gmail.com> wrote:
>
>> Hi all-
>> I've been making good use of the TF-IDF search engine (from the
>> supplementary
>> utilities at https://github.com/mrjbq7/re-factor).
>>
>> After upgrading (on July 6 2015) to:
>> Factor 0.98 x86.32 (1631, heads/master-0-g16abe47, Wed Jun 17 21:18:37
>> 2015)
>> [Clang (GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53))] on macosx
>>
>> From:
>> Factor 0.98 x86.32 (1569, heads/master-0-g1d1ef90, Thu Apr  9 13:07:50
>> 2015)
>> [Clang (GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57))] on macosx
>>
>> I found that a newer definition of ``assoc-merge'' in assocs-extras breaks
>> the compile of a couple of words in the tf-idf vocab:
>>
>> IN: scratchpad  USING: tf-idf  ; ! (current version)
>>
>> ! ERROR
>> Asset: scores
>>
>> Stack effect declaration is wrong
>> inferred
>> ( x x x -- x )
>> declared
>> ( query db -- scores )
>>
>> Asset: index-all
>>
>> Stack effect declaration is wrong
>> inferred
>> ( x x -- x )
>> declared
>> ( assoc -- index )
>>
>> Here are the pertinent definitions:
>> : scores ( query db -- scores )
>>     [ >lower split-words ] dip '[ _ tf-idf ] map assoc-merge ;
>>
>> : index1 ( path words -- path index )
>>     histogram [ pick swap 2array ] assoc-map ;
>>
>> : index-all ( assoc -- index )
>>     [ index1 ] assoc-map values assoc-merge ;
>>
>> The problem traces to the newer definition of ``assoc-merge'' in the
>> upgrade.
>>
>> - new definition:
>> USING: assocs kernel math ;
>> IN: assocs.extras  ! NEW
>> : assoc-merge ( assoc1 assoc2 -- newassoc )
>>     [ [ [ assoc-size ] bi@ + ] [ drop ] 2bi new-assoc ] 2keep
>>     [ assoc-merge! ] bi@ ;
>>
>> USING: assocs assocs.private kernel ;
>> IN: assocs.extras  ! NEW
>> : assoc-merge! ( assoc1 assoc2 -- assoc1 )
>>     over [ push-at ] with-assoc assoc-each ;
>>
>> - old definition (in my previous Factor version):
>> USING: kernel sequences ;
>> IN: assocs.extras ! OLD
>> : assoc-merge ( seq -- merge )
>>     H{ } clone [ (assoc-merge) ] reduce ;
>>
>> USING: assocs assocs.private kernel ;
>> IN: assocs.extras ! OLD
>> : (assoc-merge) ( assoc1 assoc2 -- assoc1 )
>>     over [ push-at ] with-assoc assoc-each ;
>>
>> - This solution works for me, defining:
>> : (assoc-merge) ( assoc1 assoc2 -- assoc1 ) ! from old Factor version
>>     over [ push-at ] with-assoc assoc-each ;
>>
>> : seq-assoc-merge ( seq -- merge )
>>     H{ } clone [ (assoc-merge) ] reduce ;
>>
>> - Or just defining ``seq-assoc-merge'' using the new ``assoc-merge!'':
>> : seq-assoc-merge ( seq -- merge )
>>     H{ } clone [ assoc-merge! ] reduce ;
>>
>> With these definitions, replacing ``assoc-merge'' w/ ``seq-assoc-merge''
>> in
>> the tf-idf vocab, ``scores'' and ``index-all'' compile properly again, in
>> the upgrade.
>>
>> Just a heads up, in case anyone tries out the tf-idf vocab (highly
>> recommended; a lot
>> of fun figuring out how it works, and good results using it).
>>
>> Cheers!
>> ~cw
>>
>>
>>
>> --
>> *~ Memento Amori*
>>
>>
>> ------------------------------------------------------------------------------
>> Don't Limit Your Business. Reach for the Cloud.
>> GigeNET's Cloud Solutions provide you with the tools and support that
>> you need to offload your IT needs and focus on growing your business.
>> Configured For All Businesses. Start Your Cloud Today.
>> https://www.gigenetcloud.com/
>> _______________________________________________
>> Factor-talk mailing list
>> Factor-talk@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/factor-talk
>>
>>
>


-- 
*~ Memento Amori*
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Factor-talk mailing list
Factor-talk@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/factor-talk

Reply via email to