On 27 March 2013 17:47, Jimmy O'Regan <[email protected]> wrote:
> On 27 March 2013 15:51, Jona Christopher Sahnwaldt <[email protected]> wrote:
>> Hi Jimmy,
>>
>> thanks for your tips! I added/extended two ideas yesterday. I ended up
>> at six to eight paragraphs with 400 to 500 words. Do you think that's
>> too long? The 2012 ideas I looked at were shorter.
>
> What I intended to say didn't come out quite as I meant :) -- more
> information is, of course, better, but it shouldn't be a requirement.
>
> At Apertium, the model we've settled on, over the last few years of
> trial and error, is this:
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code - a
> brief description, a rationale (why this is necessary), and a link to
> a page that describes the problem in more depth. Last year, we didn't
> have a page for each project, but one of the other guys had some spare
> time this year.

Wow, looks very nice. In what form did you submit the ideas to Google?

>
> As an example (this is more or less the project the student Sebastian
> co-mentored with me was supposed to be working on):
>
> Idea: Wrapper induction for Wiktionary
>
> Description: Given an example of the data to be extracted, and the
> source text to extract from, generate a template for use with the
> Wiktionary module that is capable of extracting that data from the
> source text.
>
> Rationale: The various language editions of Wiktionary contain several
> templates and layout conventions, often multiple templates per
> language, which makes writing extraction templates impractical.
>
> The corresponding page could then have:
>
> * The python library scrapely features wrapper (template) induction
> for HTML; this could be adapted to Mediawiki syntax.
> * Grazer[1] uses existing knowledge to determine how to extract. There
> are many existing resources (morphological dictionaries, pronunciation
> dictionaries, WordNets, etc.) that provide some of the types of data
> that could be used for this purpose. Conversely, information extracted
> from Wiktionary could be used to (semi-)automatically generate RDF
> converters for such resources.
> * It may be desirable to expand nested templates. Many templates,
> e.g., the Turkish inflection templates on en.wiktionary, are specified
> in terms of other templates. These are often difficult to extract from
> in themselves, while their parent simply generates a table. (Sweble is
> reputed to be able to handle nested templates).
>
> [1] Zhao, Shubin, and Jonathan Betz. "Corroborate and learn facts from
> the web." Proceedings of the 13th ACM SIGKDD international conference
> on Knowledge discovery and data mining. ACM, 2007.
>
>
> --
> <Sefam> Are any of the mentors around?
> <jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
Own the Future-Intel&reg; Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest.
Compete for recognition, cash, and the chance to get your game 
on Steam. $5K grand prize plus 10 genre and skill prizes. 
Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to