On 27 March 2013 17:47, Jimmy O'Regan <[email protected]> wrote: > On 27 March 2013 15:51, Jona Christopher Sahnwaldt <[email protected]> wrote: >> Hi Jimmy, >> >> thanks for your tips! I added/extended two ideas yesterday. I ended up >> at six to eight paragraphs with 400 to 500 words. Do you think that's >> too long? The 2012 ideas I looked at were shorter. > > What I intended to say didn't come out quite as I meant :) -- more > information is, of course, better, but it shouldn't be a requirement. > > At Apertium, the model we've settled on, over the last few years of > trial and error, is this: > http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code - a > brief description, a rationale (why this is necessary), and a link to > a page that describes the problem in more depth. Last year, we didn't > have a page for each project, but one of the other guys had some spare > time this year.
Wow, looks very nice. In what form did you submit the ideas to Google? > > As an example (this is more or less the project the student Sebastian > co-mentored with me was supposed to be working on): > > Idea: Wrapper induction for Wiktionary > > Description: Given an example of the data to be extracted, and the > source text to extract from, generate a template for use with the > Wiktionary module that is capable of extracting that data from the > source text. > > Rationale: The various language editions of Wiktionary contain several > templates and layout conventions, often multiple templates per > language, which makes writing extraction templates impractical. > > The corresponding page could then have: > > * The python library scrapely features wrapper (template) induction > for HTML; this could be adapted to Mediawiki syntax. > * Grazer[1] uses existing knowledge to determine how to extract. There > are many existing resources (morphological dictionaries, pronunciation > dictionaries, WordNets, etc.) that provide some of the types of data > that could be used for this purpose. Conversely, information extracted > from Wiktionary could be used to (semi-)automatically generate RDF > converters for such resources. > * It may be desirable to expand nested templates. Many templates, > e.g., the Turkish inflection templates on en.wiktionary, are specified > in terms of other templates. These are often difficult to extract from > in themselves, while their parent simply generates a table. (Sweble is > reputed to be able to handle nested templates). > > [1] Zhao, Shubin, and Jonathan Betz. "Corroborate and learn facts from > the web." Proceedings of the 13th ACM SIGKDD international conference > on Knowledge discovery and data mining. ACM, 2007. > > > -- > <Sefam> Are any of the mentors around? > <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Own the Future-Intel® Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
