Superb Cihad! This would be easy following your works. 2015-05-25 9:53 GMT+03:00 Cihad Guzel <[email protected]>: > Hi all, > > I fork nutch on my github acoount [1] . So you can see my next commits. > [1] https://github.com/cguzel/nutch > > -- > Kind Regards > Cihad Güzel > > 2015-05-20 23:50 GMT+03:00 Cihad Guzel <[email protected]>: >> >> Hi all. >> >> I have added my proposal to nutch wiki. You can see details of "Sitemap >> Crawler" from here [1]. >> >> [1] https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler >> >> -- >> Kind Regards >> >> >> 2015-05-19 1:19 GMT+03:00 Cihad Guzel <[email protected]>: >>> >>> Hi all, >>> >>> >>> I want to introduce myself. >>> >>> >>> I am a Computer Engineer and I am doing master now. I like coding.I have >>> been following some open source project for about 3 years. I am goaling to >>> make some contribution with GSOC in opensource community. >>> >>> >>> I also worked about frontend, middleware, backed development via >>> enterprise java technologies. Furthermore, experienced “Web Technologies”, >>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big >>> Data". I took place in search engine project that Apache technologies were >>> used such as Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project >>> actively in this project. You can see more information on my linkedin >>> profile[1] about me. >>> >>> >>> I mention some information for my process. My subject is "Nutch-1741 - >>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be got >>> from only pages that were scanned before in nutch crawler system. Also, the >>> degrees of importance and “change frequence” of these urls are not known >>> only guessed. But, it is possible to find the whole of urls in a up-to-date >>> sitemap file. For this reason, sitemap files in website should be crawled. >>> >>> >>> I have explained the features for this project on my proposal. I’ll add >>> it to wiki and you can see details of it on wiki at when I share . You can >>> see nutch sitemap lifecycle the drawing [3]. >>> >>> >>> [1] https://tr.linkedin.com/in/cihadguzel >>> >>> [2] https://issues.apache.org/jira/browse/NUTCH-1741 >>> >>> [3] >>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf >>> >>> >>> Kind Regards >>> >>> >>> >>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <[email protected]>: >>>> >>>> Ok Lewis, >>>> I signed up to wiki, my wiki username: cihadguzel >>>> >>>> Thanks >>>> >>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney >>>> <[email protected]>: >>>>> >>>>> Fantastic Cihad, >>>>> Thank you for introducing yourself. >>>>> As you are in the community bonding period right now, please feel free >>>>> to provide your wiki username to me and I will grant you access to the >>>>> wiki. >>>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1 >>>>> >>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC >>>>> Thanks >>>>> Lewis >>>>> >>>>> >>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <[email protected]> wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my >>>>>> application is accepted. The main reason why I have choosen the Nutch >>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 >>>>>> - >>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and >>>>>> Talat >>>>>> Uyarer for being my mentors on this process. I hope I can contribute to >>>>>> this >>>>>> project. >>>>>> >>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741 >>>>>> >>>>>> Kind Regards >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Lewis >>>> >>>> >>> >> >
-- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

