On 2/23/16 11:07 AM, Nguyen,Giang H wrote:
I think It could be very helpful if we write a python script in Madlib to 
tokenize words and assign the doc_id and start_pos correspondingly and store it 
into the database. Hence, users can save a lot more time when using CRF and 
also enable them to conveniently run crf model on big testing data.

Perhaps the Postgres text search stuff could be used for this (maybe to_tsvector())?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Reply via email to