Hi Kasun and thanks for the feedback on the project idea! You can find my answers inline. Cheers!
On 3/2/15 5:10 AM, kasun perera wrote: > > Forwarding my last email since I didn't get any feedback. > Thanks > > ---------- Forwarded message ---------- > > Hi Marco and others > > I like to work on the Gsoc project "Fact Extraction from Wikipedia Text" > during this summer. > > I went through the project description and the research papers mentioned > under the description. I have few questions to clarify. > > 1- As mentioned in the project idea the main objective is the > implementation of a new text extractor. Will this need to be implemented > inside the current extraction-framework? Ideally yes. > Or would it be a completely new > tool? > > 2- Also it mentioned the use of NLP techniques to process Wikipedia > text. Does this means extraction of Dependency relationships to get the > frame elements (FE) and lexical unit(LU)? Dependency parsing may not be needed, since entity linking can be applied to fulfill the task. > There are several NLP > libraries like Stanford parser, RelEx, NLTK etc. Is there any decision > made which NLP library to use? NLTK could be a way to go if we decide to use Python, but there is no constraint on libraries. The ones that serve our purposes are the good ones. :-) > > 3- Also regarding the content of a Wikipedia page; do we use all the > sentences from the Wikipedia page? My idea is it's better if we can use > important sentences rather than all the sentences. If that is the better > idea we have to come up with a criteria to select important sentences. Good point. I would first proceed with a domain-specific use case (i.e., soccer) to assess the feasibility of the idea. Then, we can generalize. Hence, we want to extract specific facts from sentences that may trigger soccer-related frames. Verb extraction and ranking (i.e., step A of the idea) would cater for this task. Cheers! > > > > -- > Regards > > Kasun Perera > > > > > -- > Regards > > Kasun Perera > -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j ------------------------------------------------------------------------------ Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Dbpedia-gsoc mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc
