Hi Kasun and thanks for the feedback on the project idea!
You can find my answers inline.
Cheers!

On 3/2/15 5:10 AM, kasun perera wrote:
>
> Forwarding my last email since I didn't get any feedback.
> Thanks
>
> ---------- Forwarded message ----------
>
> Hi Marco and others
>
> I like to work on the Gsoc project "Fact Extraction from Wikipedia Text"
> during this summer.
>
> I went through the project description and the research papers mentioned
> under the description. I have few questions to clarify.
>
> 1- As mentioned in the project idea the main objective is the
> implementation of a new text extractor. Will this need to be implemented
> inside the current extraction-framework?
Ideally yes.
> Or would it be a completely new
> tool?
>
> 2- Also it mentioned the use of NLP techniques to process Wikipedia
> text. Does this means extraction of Dependency relationships to get the
> frame elements (FE) and lexical unit(LU)?
Dependency parsing may not be needed, since entity linking can be 
applied to fulfill the task.
> There are several NLP
> libraries like Stanford parser, RelEx, NLTK etc. Is there any decision
> made which NLP library to use?
NLTK could be a way to go if we decide to use Python, but there is no 
constraint on libraries.
The ones that serve our purposes are the good ones. :-)
>
> 3- Also regarding the content of a Wikipedia page; do we use all the
> sentences from the Wikipedia page? My idea is it's better if we can use
> important sentences rather than all the sentences. If that is the better
> idea we have to come up with a criteria to select important sentences.
Good point.
I would first proceed with a domain-specific use case (i.e., soccer) to 
assess the feasibility of the idea. Then, we can generalize.
Hence, we want to extract specific facts from sentences that may trigger 
soccer-related frames.
Verb extraction and ranking (i.e., step A of the idea) would cater for 
this task.

Cheers!
>
>
>
> --
> Regards
>
> Kasun Perera
>
>
>
>
> --
> Regards
>
> Kasun Perera
>

-- 
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc

Reply via email to