Thanks Joern. Good question about license.... I wrote a web crawler and it polls a bunch of RSS news feeds (google news and BBC mainly) as well as wikipedia and then recursively scrapes to N depth on them. So.... It's hard to say what the license would be, I will look deeper, and maybe only use the wiki data. thanks
On Fri, Oct 11, 2013 at 3:17 AM, Jörn Kottmann <[email protected]> wrote: > On 10/10/2013 06:54 PM, Mark G wrote: > >> thanks, I am also working on a rapid model builder framework that I would >> like you to look at. I posted a description earlier but no feedback yet, I >> was thinking I could check it into the sandbox so everyone can run it, >> along with a filebased implementation that includes a file of ~200K >> sentences. >> This tool would allow users to specify a file of sentences from their >> data, >> a file (dictionary) of known named entities, and a blacklist file (for >> false positive reduction) in order to build a model for a specific entity >> type. >> > > +1 I posted feedback to this on the user list. > > Just go ahead and open a Jira issue for it, and then add it to the sandbox. > > What is the license of the sentence file? > > Jörn >
