On 27 March 2013 15:51, Jona Christopher Sahnwaldt <[email protected]> wrote: > Hi Jimmy, > > thanks for your tips! I added/extended two ideas yesterday. I ended up > at six to eight paragraphs with 400 to 500 words. Do you think that's > too long? The 2012 ideas I looked at were shorter.
What I intended to say didn't come out quite as I meant :) -- more information is, of course, better, but it shouldn't be a requirement. At Apertium, the model we've settled on, over the last few years of trial and error, is this: http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code - a brief description, a rationale (why this is necessary), and a link to a page that describes the problem in more depth. Last year, we didn't have a page for each project, but one of the other guys had some spare time this year. As an example (this is more or less the project the student Sebastian co-mentored with me was supposed to be working on): Idea: Wrapper induction for Wiktionary Description: Given an example of the data to be extracted, and the source text to extract from, generate a template for use with the Wiktionary module that is capable of extracting that data from the source text. Rationale: The various language editions of Wiktionary contain several templates and layout conventions, often multiple templates per language, which makes writing extraction templates impractical. The corresponding page could then have: * The python library scrapely features wrapper (template) induction for HTML; this could be adapted to Mediawiki syntax. * Grazer[1] uses existing knowledge to determine how to extract. There are many existing resources (morphological dictionaries, pronunciation dictionaries, WordNets, etc.) that provide some of the types of data that could be used for this purpose. Conversely, information extracted from Wiktionary could be used to (semi-)automatically generate RDF converters for such resources. * It may be desirable to expand nested templates. Many templates, e.g., the Turkish inflection templates on en.wiktionary, are specified in terms of other templates. These are often difficult to extract from in themselves, while their parent simply generates a table. (Sweble is reputed to be able to handle nested templates). [1] Zhao, Shubin, and Jonathan Betz. "Corroborate and learn facts from the web." Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2007. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Own the Future-Intel® Level Up Game Demo Contest 2013 Rise to greatness in Intel's independent game demo contest. Compete for recognition, cash, and the chance to get your game on Steam. $5K grand prize plus 10 genre and skill prizes. Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
