Hi Sebastian, thanks for the invitation and setting this up. Hello everybody,
I am so glad to be on board. About me: I'm currently a grad student (masters) at Univ. of Southern California (USC), Los Angeles. I'm fortunate enough to meet professor Chris Mattmann at USC. Prior to my grad studies, I worked as a full-stack developer at few startups in Bangalore, India. I am also a tech co-founder of a text analysis platform, http://datoin.com. I found my interest in A.I. so here I am at USC grad school. I am on my way for an internship at NASA JPL this summer. How I met Nutch: In 2014, with my team at Datoin.com we integrated Crawler/Input component to our platform. We picked Nutch because we had rest of the platform on Hadoop. Boom! that was when I first put my hands on nutch code. Last fall I took a graduate level Information Retrieval (IR) course at USC taught by prof. Mattmann. Then joined hands with his team at NASA JPL to work on IR related projects. We use and improve Nutch. Some of my recent work related to Nutch: Added an extension point and an extension to pass certain external URLS when db.ignore.external is set. Fixed bugs and improved common crawl dumper. A clustering toolkit for clustering Nutch output based on CSS styles and DOM structures [2]... More coming soon this summer! I am interested in after-crawl analysis and bringing them back to Nutch as extensions. I also presented "Clustering the output of Nutch ...." at recent ApacheCon NA [1]. I also love work on these: - reusable JVM containers to make it fast and efficient. *Thinking of spark execution backend* (A step ahead - a switchable execution backend to support MR and Spark, just like what Gora did to storage backend). - stats and analytics of crawl job in real-time I am exicted to be involved with the community to imrove Nutch. - Thanks and Regards, Thamme [1] http://www.slideshare.net/thammegowda/clustering-output-of-apache-nutch-using-apache-spark [2] https://github.com/uscdataScience/autoextractor/wiki/Clustering-Tutorial -- *Thamme Gowda * Grad Student at USC <http://usc.edu> @thammegowda <https://twitter.com/thammegowda> | 213-536-3552 http://scf.usc.edu/~tnarayan/ On Sun, May 22, 2016 at 1:02 PM, Sebastian Nagel <[email protected] > wrote: > Dear all, > > it is my pleasure to announce that Thamme Gowda N. has joined us > as committer and member of the Nutch PMC. Congratulations on your > new role within the Apache Nutch community! > > Thamme, would you mind telling us about yourself, your relation > to Nutch, what you've done so far, etc.? > > Cheers and welcome on board! > > Sebastian (on behalf of the Nutch PMC) >

