Hello everyone,

I'm Giang Nguyen, a student at the University of Florida. I have been trying to 
get familiar with Madlib as a user to find out what I can potentially 
contribute to Madlib. I have run CRF as a user, and one of the thing I noticed 
that could cause users (especially users that aren't familiar with CRF and 
Postgres) some trouble is that they have to manually create the testing segment 
table test_segmenttbl(doc_id integer, start_pos integer, seg_text text) to feed 
into the crf_test_fgen(). This could be a tedious task for users especially 
when they have a big corpus of text. I think It could be very helpful if we 
write a python script in Madlib to tokenize words and assign the doc_id and 
start_pos correspondingly and store it into the database. Hence, users can save 
a lot more time when using CRF and also enable them to conveniently run crf 
model on big testing data.


Best,

Giang Nguyen

University of Florida

Reply via email to