Hi All; I would like to share you some information about me and my accepted proposal:
Currently a topic classification engine is being developed with STANBOL-197 [4] for Apache Stanbol [1][2]. Implementation plan is supposed as using MoreLikeThis queries on a SolrYard instance with topics indexed by aggregating the text of abstracts of all entities marked categorized by a given SKOS topic from Dbpedia and performing a more like this query at that issue [4]. However it uses Solr based classifier and no work was done regarding OpenNLP [6] or Apache Mahout [5]. *STANBOL-1294* [3] aims to have alternative implementations for a topic classifier for example known resources like Apache Mahout [5] or OpenNLP [6]. The most recent release of the Topic Classification Engine mentioned in [4] and [7] contains of three modules [8]. Topic engines are expected to contribute fise: TopicAnnotation [9] to the metadata of the content item. I'll start from the Enhancement-Engine archetype [10] to my work. A general architecture containing of: * TopicClassifier * TrainingSet allows to have different implementations of managing the training set (e.g. in Solr, a RDF tripleStore, a database or simple files in a file system) and TopicClassifiers (Solr, OpenNLP, Mahout,...) TrainingSet part is only required for TopicClassifier that can dynamically update their classification models [11]. The interfaces itself will be adapted/improved as more implementations added. The current API looks a bit tailored to the Solr based implementation. If possible I would like to work for Cross Validation to be implemented in an implementation independent way [11]. I've divided my work into 4 parts: 1) Different implementations for managing the TrainingSet: In the current approach, the training set has to be stored in Solr and the users have to configure which fields will be used for training and which fields will be used as categories. It would be nice to have an abstract API for managing a TrainingSet in Stanbol independent of the final backend which actually could be Solr or any other storage system. 2) Different implementations of the Classifier: Current classifier API is also completely coupled with the current implementation, therefore it should be refactored for allowing different implementations based on, for instance, different frameworks like OpenNLP and Apache Mahout. 3) Changing current TopicClassification engine for working with the new APIs: Current TopicClassification engine will be changed for working with the new APIs. 4) Evaluation support: Evaluation support is supposed to be added. I'm attending Istanbul Technical University one of the top university among other universities at Turkey [12]. I am a Senior Software Developer and I'm working at a company that is developing Search Engine of Turkey. I am the team lead of Index&Query and Analytics teams. I use many of the Apache projects and contributing them including Solr, Lucene, Nutch [13] and Hibernate [18]. I've worked/studied for animal sound discrimination and also worked/studied for constraint satisfaction problems with a novel approach using multi dimensional genetic algorithms at Bachelor's. On the other hand I am working/studying on Machine Learning and Information Retrieval on Big Data at M.Sc. I have implemented a framework for resolving disambiguation of a agglutinating language, Turkish. I have made a research and implementation about opinion classification on Democrat and Republican Tweets without using user information. At another project I have implemented an algorithm for calculating kinematic equations to manage a robotic arm for Robocup Rescue Competition. I have also made a research for Simultaneous Localization and Mapping for Multiple Robots at Yildiz Technical University, Turkey. Developing a chess engine is another project that I've researched about artifical intelligence. My work experiences are academic related too. I always research about academic papers from old ones to recent. We work with academicians at our work too. On the other hand I am a Java lover. Currently I have developed a web search API for SolrCloud at our search engine project. I have implemented an index quality application and also an analytics application too. I am responsible for tunning Solr instances as a Java application and cache improvements too. Some of the technologies that I used: “Java, Spring, Spring Security, Struts2, EJB, JSP, Quartz, Restful Services, Web Services, JAX-RS, JAX-WS, jQuery, CSS, HTML, HTML5, Angular JS, MySQL, PostgreSQL, NoSQL Databases, Hibernate, Logback, JPA, J2SE, J2EE, JavaFX, C, SQL, Oracle, Pascal, Assembly, FreeBSD, Linux, JavaScript, HTML, SVN, Maven, Apache Tomcat, Jetty, Embedded Jetty, Grizzly, Hudson, Atlassian: JIRA, Crowd, FishEye, Bamboo, Confluence”. My assigned *mentors* are *Rupert Westenthaler* and *Andreas Kuckartz*. *My plan is that: 2 weeks for current API refactoring, 1 week for OpenNLP integration and 2 weeks for Mahout integration. Second part consists of changing current TopicClassification engine for* *working with the new APIs, different implementations for managing the TrainingSet and documentation.* I am happy to be a part of such a nice project. Thanks; Furkan KAMACI [1] (2014, March, 20) Available: https://stanbol.apache.org/ [2] (2014, March, 20) Available: http://www.slideshare.net/fabchrist/what-apachestanbolcandoforyouapacheconeu12-byfchrist [3] (2014, March, 20) Available: https://issues.apache.org/jira/browse/STANBOL-1294 [4] (2014, March, 20) Available: https://issues.apache.org/jira/browse/STANBOL-197 [5] (2014, March, 20) Available: https://mahout.apache.org/ [6] (2014, March, 20) Available: https://opennlp.apache.org/ [7] (2014, March, 20) Available: http://vimeo.com/45633053 [8] (2014, March, 20) Available: http://search.maven.org/#search|ga|1|stanbol%20topic [9] (2014, March, 20) Available: http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetopicannotation [10] (2014, March, 20) Available: http://svn.apache.org/repos/asf/stanbol/trunk/development/archetypes/enhancement-engine/ [11] (2014, March, 20) Available: http://mail-archives.apache.org/mod_mbox/stanbol-dev/201403.mbox/%3CCAA7LAO36croHH-hkUb04SXF2DMu7hWTqhwNA0SzJHiH=1ek...@mail.gmail.com%3E [12] (2014, March, 20) Available: http://www.topuniversities.com/universities/istanbul-technical-university/postgrad [13] (2014, March, 20) Available: https://issues.apache.org/jira/secure/ViewProfile.jspa?name=kamaci [14] (2014, March, 20) Available: http://search-lucene.com [15] (2014, March, 20) Available: http://lucene.472066.n3.nabble.com/Seeking-New-Moderators-for-solr-user-lucene-td4096447.html [16] (2014, March, 20) Available: http://www.amazon.com/gp/product/1449359957/ref=pdp_new_dp_review [17] (2014, March, 20) Available: http://www.manning.com/urma/ [18] (2014, March, 21) Available: https://hibernate.atlassian.net/secure/ViewProfile.jspa?name=kamaci