On Sat, Apr 13, 2013 at 6:50 PM, Jimmy O'Regan <[email protected]> wrote:
> On 13 April 2013 05:57, Shivani Poddar <[email protected]> wrote:
> > On Fri, Apr 12, 2013 at 3:02 PM, Jimmy O'Regan <[email protected]>
> wrote:
> >> If you're interested in information extraction of this kind, the good
> >> news is that we have the data from the infoboxes, and that could be
> >> used for semi-supervised creation of this kind of extraction template.
> >> If your idea was based around something related to this, that could
> >> make a great project.
> >
> > This does seem to cover a major part of my interest. Although my eventual
> > goals (which are research based) would definitely look at the amalgam of
> the
> > 3 concepts Pablo mentioned, but, as of now, for an immediate project,
> this
> > seems very interesting to me. I would like to take it up for the coming
> > summer.
> > Also, by the creation for semi supervised template, would you mean a
> > template for (say only) Hindi? Or would extending it for all languages be
> > fine ?
> >
>
> The example I gave was deliberately basic, and can be achieved by
> iterating through a list of properties and checking if the abstract
> contains either the string (e.g., name) or a regex match (date). You
> could either replace the occurrence with the property name (as in my
> example), or surround it with XML-like tags, to be more suitable as
> input to something like MinorThird
> (http://teamcohen.github.io/MinorThird/).
>
Yes, I read the Infobox documentation at point 2.3 in
http://wiki.dbpedia.org/DeveloperDocumentation/Extractor?v=vqu#h110-6 which
makes the stub for the required code very clear. Although the documentation
says that it supports all languages . (does it ?)
>
> That's simple enough that, with the caveat that the extraction
> framework's date handling should be used (which will involve gaining
> some small level of familiarity with that code), it could make a good
> coding challenge for this idea.
Does it refer to extending the implementation of Infoboxes to achieve *i18n
data fusion?*
Or extending support for different languages in other extractors which are
not documented to support all languages. Say for instance Article
Categories Extractor, Category Label Extractor, Homepage Extractor etc.
*
*
*
*
Regards,
Shivani
That would be relatively language independent, for languages with
> simple morphology (it might work for Hindi, but would probably not
> work for Sanksrit), but would require a language processing pipeline
> for more complex languages.
>
> To be more dbpedia-specific, i.e., instead of the abstract text, using
> the MediaWiki source text:
> '''David Robert Joseph Beckham''' ([[Londres]], [[Ingalaterra]],
> [[1975]]eko [[maiatzaren 2]]a) futbolari ingelesa da.
> would make it more-or-less language independent[1], and would simplify
> the matching of text to occurrences (though it would possibly make it
> more difficult to prepare the text as input to something like
> MinorThird).
>
> In any case, the thing to bear in mind is that the same value may
> appear with a number of attributes - Dublin is the largest city in
> Ireland, as well as its capital; many kings were sons of their
> predecessors, etc. - or even independent of them (i.e., the value may
> appear in a sentence in a way that has nothing to do with any of the
> possible attributes).
>
> [1] The variation in the display text ([[dog|dogs]] or [[dog]]s) may
> need to be handled to generalise the templates better, particularly
> for complex languages, but I haven't given it too much thought.
>
>
> --
> <Sefam> Are any of the mentors around?
> <jimregan> yes, they're the ones trolling you
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-gsoc mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-gsoc