Hi Sebastian,
 thanks for the invitation and setting this up.

Hello everybody,

I am so glad to be on board.

About me:
  I'm currently a grad student (masters) at Univ. of Southern California
(USC), Los Angeles. I'm fortunate enough to meet professor Chris Mattmann
at USC.
Prior to my grad studies, I worked as a full-stack developer at few
startups in Bangalore, India. I am also a tech co-founder of a text
analysis platform, http://datoin.com. I found my interest in A.I. so here I
am at USC grad school. I am on my way for an internship at NASA JPL this
summer.

How I met Nutch:
 In 2014, with my team at Datoin.com we integrated Crawler/Input component
to our platform. We picked Nutch because we had rest of the platform on
Hadoop. Boom! that was when I first put my hands on nutch code.
 Last fall I took a graduate level Information Retrieval (IR) course at USC
taught by prof. Mattmann. Then joined hands with his team at NASA JPL to
work on IR related projects. We use and improve Nutch.

Some of my recent work related to Nutch:
Added an extension point and an extension to pass certain external URLS
when db.ignore.external is set. Fixed bugs and improved common crawl
dumper. A clustering toolkit for clustering Nutch output based on CSS
styles and DOM structures [2]...

More coming soon this summer!

I am interested in after-crawl analysis and bringing them back to Nutch as
extensions.
I also presented "Clustering the output of Nutch ...." at recent ApacheCon
NA [1].

I also love work on these:

   - reusable JVM containers to make it fast and efficient. *Thinking of
   spark execution backend* (A step ahead - a switchable execution backend
   to support MR and Spark, just like what Gora did to storage backend).
   - stats and analytics of crawl job in real-time

I am exicted to be involved with the community to imrove Nutch.

-
Thanks and Regards,
Thamme

[1]
http://www.slideshare.net/thammegowda/clustering-output-of-apache-nutch-using-apache-spark
[2] https://github.com/uscdataScience/autoextractor/wiki/Clustering-Tutorial


--
*Thamme Gowda *
Grad Student at USC <http://usc.edu>
@thammegowda <https://twitter.com/thammegowda> | 213-536-3552
http://scf.usc.edu/~tnarayan/

On Sun, May 22, 2016 at 1:02 PM, Sebastian Nagel <[email protected]
> wrote:

> Dear all,
>
> it is my pleasure to announce that Thamme Gowda N. has joined us
> as committer and member of the Nutch PMC.  Congratulations on your
> new role within the Apache Nutch community!
>
> Thamme, would you mind telling us about yourself, your relation
> to Nutch, what you've done so far, etc.?
>
> Cheers and welcome on board!
>
> Sebastian (on behalf of the Nutch PMC)
>

Reply via email to