Yes, I will create a JIRA for it and start working on it also. -Giang
> On Feb 23, 2016, at 7:59 PM, Frank McQuillan <[email protected]> wrote: > > Jim's approach seems like a reasonable way to go. > > Giang, can you create a JIRA for this request? You are welcome to start > working on it if you would like to contribute this to improve CRF usability. > > Frank > >> On Tue, Feb 23, 2016 at 3:24 PM, Jim Nasby <[email protected]> wrote: >> >>> On 2/23/16 11:07 AM, Nguyen,Giang H wrote: >>> >>> I think It could be very helpful if we write a python script in Madlib to >>> tokenize words and assign the doc_id and start_pos correspondingly and >>> store it into the database. Hence, users can save a lot more time when >>> using CRF and also enable them to conveniently run crf model on big testing >>> data. >> >> Perhaps the Postgres text search stuff could be used for this (maybe >> to_tsvector())? >> -- >> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX >> Experts in Analytics, Data Architecture and PostgreSQL >> Data in Trouble? Get it in Treble! http://BlueTreble.com >>
