Hi all, I fork nutch on my github acoount [1] . So you can see my next commits. [1] https://github.com/cguzel/nutch
-- Kind Regards Cihad Güzel 2015-05-20 23:50 GMT+03:00 Cihad Guzel <[email protected]>: > Hi all. > > I have added my proposal to nutch wiki. You can see details of "Sitemap > Crawler" from here [1]. > > [1] https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler > > -- > Kind Regards > > > 2015-05-19 1:19 GMT+03:00 Cihad Guzel <[email protected]>: > >> Hi all, >> >> I want to introduce myself. >> >> I am a Computer Engineer and I am doing master now. I like coding.I have >> been following some open source project for about 3 years. I am goaling to >> make some contribution with GSOC in opensource community. >> >> I also worked about frontend, middleware, backed development via >> enterprise java technologies. Furthermore, experienced “Web Technologies”, >> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big >> Data". I took place in search engine project that Apache technologies were >> used such as Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project >> actively in this project. You can see more information on my linkedin >> profile[1] about me. >> >> I mention some information for my process. My subject is "Nutch-1741 - >> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be >> got from only pages that were scanned before in nutch crawler system. Also, >> the degrees of importance and “change frequence” of these urls are not >> known only guessed. But, it is possible to find the whole of urls in a >> up-to-date sitemap file. For this reason, sitemap files in website should >> be crawled. >> >> I have explained the features for this project on my proposal. I’ll add >> it to wiki and you can see details of it on wiki at when I share . You can >> see nutch sitemap lifecycle the drawing [3]. >> >> [1] https://tr.linkedin.com/in/cihadguzel >> >> [2] https://issues.apache.org/jira/browse/NUTCH-1741 >> >> [3] >> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf >> >> Kind Regards >> >> >> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <[email protected]>: >> >>> Ok Lewis, >>> I signed up to wiki, my wiki username: cihadguzel >>> >>> Thanks >>> >>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney < >>> [email protected]>: >>> >>>> Fantastic Cihad, >>>> Thank you for introducing yourself. >>>> As you are in the community bonding period right now, please feel free >>>> to provide your wiki username to me and I will grant you access to the >>>> wiki. >>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1 >>>> >>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC >>>> Thanks >>>> Lewis >>>> >>>> >>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <[email protected]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I had applied the GSoC 2015 for Apache Nutch Project and my >>>>> application is accepted. The main reason why I have choosen the Nutch >>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 - >>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and >>>>> Talat Uyarer for being my mentors on this process. I hope I can contribute >>>>> to this project. >>>>> >>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741 >>>>> >>>>> Kind Regards >>>>> >>>> >>>> >>>> >>>> -- >>>> *Lewis* >>>> >>> >>> >> >

