Hi Emilio, On 3/2/15 11:40 AM, Emilio Dorigatti wrote: > Hello, > I am also interested in working in the project about fact extraction > from wikipedia text, I would like to ask for some clarifications about > the machine learning part of it. The core of the project is to train a > classifier using a training set built following the approaches described > in the linked papers. As I understood it, the following tasks are > needed; given a sentence > > 1a. Identify all the LUs using NLP techniques; > 2b. Identify all the entities in the sentence which may represent FEs > using again NLP techniques (ASRL perhaps?) Entity linking is the way to go. > 2. Use the FrameNet definition for the identified LUs to find the > required FEs; FrameNet may be either too specific or too complex for crowdsourcing. Hence, we should adapt/simplify the frame and FEs definitions accordingly. > 3. Ask the user whether a certain entity fits a certain FE (for all > entities and FEs); > 4. Understand which is the correct LU based on the meanings given in > step (3). The correct LU should be already there, and we want to minimize LU ambiguity, i.e., how many frames can be triggered by one LU. Thus, the selection of LU via verb ranking will be a VERY important step. > > In the linked papers few is mentioned about steps (1a) and (1b) (but > clarification has already been asked for), step (2) is straightforward > and step (4) has already been implemented, the classifier is needed for > step (3). Thus, it has to answers questions such as "can this entity be > this FE?" or "is this entity this FE in this context?" (the latter being > a lot harder in my opinion). It is not clear to me, though, which > features should be used to train this classifier. Good point. I already have a baseline including linguistic features other than the FEs and frames themselves (that will come as output of the crowdsourced annotation). We should first test it, and then tune the features if needed. > > Frequently, in text classification, there is an one-to-one mapping > between words and features; in this case FEs have to be used instead of > words (FrameNet currently recognizes slightly more than 10k FEs). There > is also a need for features identifying the possible entities, but > clearly we cannot use the whole DBpedia knowledge base (roughly 4.6 > million entities) for this. I see that FEs belonging to a frame are > usually of different types, so I think using /classes/ instead of > /instances/ could be a promising alternative (DBpedia has 685 classes). +1 for the entity types. This feature is actually implemented as a suggestion mechanism in the referenced workshop paper, and we could reuse it as an extra feature. But first we need to focus on something that works, then we can tune. > Probably other features are needed though. > > Sorry for the long wall of text, I tried to express my thoughts in the > shortest way I could. What do you think? That's a great feedback, please keep up with it! Cheers! > > Emilio. > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for all > things parallel software development, from weekly thought leadership blogs to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > > > > _______________________________________________ > Dbpedia-gsoc mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc >
-- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Dbpedia-gsoc mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
