Hi all. I have added my proposal to nutch wiki. You can see details of "Sitemap Crawler" from here [1].
[1] https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler -- Kind Regards 2015-05-19 1:19 GMT+03:00 Cihad Guzel <[email protected]>: > Hi all, > > I want to introduce myself. > > I am a Computer Engineer and I am doing master now. I like coding.I have > been following some open source project for about 3 years. I am goaling to > make some contribution with GSOC in opensource community. > > I also worked about frontend, middleware, backed development via > enterprise java technologies. Furthermore, experienced “Web Technologies”, > "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big > Data". I took place in search engine project that Apache technologies were > used such as Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project > actively in this project. You can see more information on my linkedin > profile[1] about me. > > I mention some information for my process. My subject is "Nutch-1741 - > Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got > from only pages that were scanned before in nutch crawler system. Also, the > degrees of importance and “change frequence” of these urls are not known > only guessed. But, it is possible to find the whole of urls in a up-to-date > sitemap file. For this reason, sitemap files in website should be crawled. > > I have explained the features for this project on my proposal. I’ll add it > to wiki and you can see details of it on wiki at when I share . You can see > nutch sitemap lifecycle the drawing [3]. > > [1] https://tr.linkedin.com/in/cihadguzel > > [2] https://issues.apache.org/jira/browse/NUTCH-1741 > > [3] > https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf > > Kind Regards > > > 2015-05-19 1:16 GMT+03:00 Cihad Guzel <[email protected]>: > >> Ok Lewis, >> I signed up to wiki, my wiki username: cihadguzel >> >> Thanks >> >> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney < >> [email protected]>: >> >>> Fantastic Cihad, >>> Thank you for introducing yourself. >>> As you are in the community bonding period right now, please feel free >>> to provide your wiki username to me and I will grant you access to the wiki. >>> Please also feel free to pick up some lingering issues for Nutch 2.3.1 >>> >>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC >>> Thanks >>> Lewis >>> >>> >>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <[email protected]> wrote: >>> >>>> Hi all, >>>> >>>> I had applied the GSoC 2015 for Apache Nutch Project and my application >>>> is accepted. The main reason why I have choosen the Nutch Project for GSOC >>>> is knowing the Nutch closely. My subject is "Nutch-1741 - Support of >>>> Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and Talat Uyarer >>>> for being my mentors on this process. I hope I can contribute to this >>>> project. >>>> >>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741 >>>> >>>> Kind Regards >>>> >>> >>> >>> >>> -- >>> *Lewis* >>> >> >> >

