Hi All;

I would like to share you some information about me and my accepted
proposal:

Currently a topic classification engine is being developed with STANBOL-197
[4] for Apache Stanbol [1][2]. Implementation plan is supposed as  using
MoreLikeThis queries on a SolrYard instance with topics indexed by
aggregating the text of abstracts of all entities marked categorized by a
given SKOS topic from Dbpedia and performing a more like this query at that
issue [4]. However it uses Solr based classifier and no work was done
regarding OpenNLP [6] or Apache Mahout [5]. *STANBOL-1294* [3] aims to have
alternative implementations for a topic classifier for example known
resources like Apache Mahout [5] or OpenNLP [6].

The most recent release of the Topic Classification Engine mentioned in [4]
and [7] contains of three modules [8]. Topic engines are expected to
contribute fise: TopicAnnotation [9] to the metadata of the content item.
I'll start from the Enhancement-Engine archetype [10] to my work. A general
architecture containing of:

* TopicClassifier
* TrainingSet

allows to have different implementations of managing the training set (e.g.
in Solr, a RDF tripleStore, a database or simple files in a file system)
and TopicClassifiers (Solr, OpenNLP, Mahout,...) TrainingSet part is only
required for TopicClassifier that can dynamically update their
classification models [11].

The interfaces itself will be adapted/improved as more implementations
added. The current API looks a bit tailored to the Solr based
implementation. If possible I would like to work for Cross Validation to be
implemented in an implementation independent way [11].

I've divided my work into 4 parts:

1) Different implementations for managing the TrainingSet: In the current
approach, the training set has to be stored in Solr and the users have to
configure which fields will be used for training and which fields will be
used as categories. It would be nice to have an abstract API for managing a
TrainingSet in Stanbol independent of the final backend which actually
could be Solr or any other storage system.

2) Different implementations of the Classifier:
Current classifier API is also completely coupled with the current
implementation, therefore it should be refactored for allowing different
implementations based on, for instance, different frameworks like OpenNLP
and Apache Mahout.

3) Changing current TopicClassification engine for working with the new
APIs:
Current TopicClassification engine will be changed for working with the new
APIs.

4) Evaluation support:
Evaluation support is supposed to be added.

I'm attending Istanbul Technical University one of the top university among
other universities at Turkey [12]. I am a Senior Software Developer and I'm
working at a company that is developing Search Engine of Turkey. I am the
team lead of Index&Query and Analytics teams. I use many of the Apache
projects and contributing them including Solr, Lucene, Nutch [13] and
Hibernate [18].

I've worked/studied for animal sound discrimination and also worked/studied
for constraint satisfaction problems with a novel approach using multi
dimensional genetic algorithms at Bachelor's. On the other hand I am
working/studying on Machine Learning and Information Retrieval on Big Data
at M.Sc. I have implemented a framework for resolving disambiguation of a
agglutinating language, Turkish. I have made a research and implementation
about opinion classification on Democrat and Republican Tweets without
using user information. At another project I have implemented an algorithm
for calculating kinematic equations to manage a robotic arm for Robocup
Rescue Competition. I have also made a research for Simultaneous
Localization and Mapping for Multiple Robots at Yildiz Technical
University, Turkey. Developing a chess engine is another project that I've
researched about artifical intelligence.

My work experiences are academic related too. I always research about
academic papers from old ones to recent. We work with academicians at our
work too. On the other hand I am a Java lover. Currently I have developed a
web search API for SolrCloud at our search engine project. I have
implemented an index quality application and also an analytics application
too. I am responsible for tunning Solr instances as a Java application and
cache improvements too. Some of the technologies that I used: “Java,
Spring, Spring Security, Struts2, EJB, JSP, Quartz, Restful Services, Web
Services, JAX-RS, JAX-WS, jQuery, CSS, HTML, HTML5, Angular JS, MySQL,
PostgreSQL, NoSQL Databases, Hibernate, Logback, JPA, J2SE, J2EE, JavaFX,
C, SQL, Oracle, Pascal, Assembly, FreeBSD, Linux, JavaScript, HTML, SVN,
Maven, Apache Tomcat, Jetty, Embedded Jetty, Grizzly, Hudson, Atlassian:
JIRA, Crowd, FishEye, Bamboo, Confluence”.

My assigned *mentors* are *Rupert Westenthaler* and *Andreas Kuckartz*. *My
plan is that: 2 weeks for current API refactoring, 1 week for OpenNLP
integration and 2 weeks for Mahout integration. Second part consists of
changing current TopicClassification engine for*
*working with the new APIs, different implementations for managing the
TrainingSet and documentation.*

I am happy to be a part of such a nice project.

Thanks;
Furkan KAMACI


[1] (2014, March, 20) Available: https://stanbol.apache.org/
[2] (2014, March, 20) Available:
http://www.slideshare.net/fabchrist/what-apachestanbolcandoforyouapacheconeu12-byfchrist
[3] (2014, March, 20) Available:
https://issues.apache.org/jira/browse/STANBOL-1294
[4] (2014, March, 20) Available:
https://issues.apache.org/jira/browse/STANBOL-197
[5] (2014, March, 20) Available: https://mahout.apache.org/
[6] (2014, March, 20) Available: https://opennlp.apache.org/
[7] (2014, March, 20) Available: http://vimeo.com/45633053
[8] (2014, March, 20) Available:
http://search.maven.org/#search|ga|1|stanbol%20topic
[9] (2014, March, 20) Available:
http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetopicannotation
[10] (2014, March, 20) Available:
http://svn.apache.org/repos/asf/stanbol/trunk/development/archetypes/enhancement-engine/
[11] (2014, March, 20) Available:
http://mail-archives.apache.org/mod_mbox/stanbol-dev/201403.mbox/%3CCAA7LAO36croHH-hkUb04SXF2DMu7hWTqhwNA0SzJHiH=1ek...@mail.gmail.com%3E
[12] (2014, March, 20) Available:
http://www.topuniversities.com/universities/istanbul-technical-university/postgrad
[13] (2014, March, 20) Available:
https://issues.apache.org/jira/secure/ViewProfile.jspa?name=kamaci
[14] (2014, March, 20) Available: http://search-lucene.com
[15] (2014, March, 20) Available:
http://lucene.472066.n3.nabble.com/Seeking-New-Moderators-for-solr-user-lucene-td4096447.html
[16] (2014, March, 20) Available:
http://www.amazon.com/gp/product/1449359957/ref=pdp_new_dp_review
[17] (2014, March, 20) Available: http://www.manning.com/urma/
[18] (2014, March, 21) Available:
https://hibernate.atlassian.net/secure/ViewProfile.jspa?name=kamaci

Reply via email to