Re: Nutch-1741 in GSOC 2015

Cihad Guzel Sun, 24 May 2015 23:55:43 -0700

Hi all,

I fork nutch on my github acoount [1] . So you can see my next commits.
[1] https://github.com/cguzel/nutch


--
Kind Regards
Cihad Güzel

2015-05-20 23:50 GMT+03:00 Cihad Guzel <[email protected]>:

> Hi all.
>
> I have added my proposal to nutch wiki. You can see details of "Sitemap
> Crawler" from here [1].
>
> [1]  https://wiki.apache.org/nutch/GoogleSummerOfCode/SitemapCrawler
>
> --
> Kind Regards
>
>
> 2015-05-19 1:19 GMT+03:00 Cihad Guzel <[email protected]>:
>
>> Hi all,
>>
>> I want to introduce myself.
>>
>> I am a Computer Engineer and I am doing master now. I like coding.I have
>> been following some open source project for about 3 years. I am goaling to
>> make some contribution with GSOC in opensource community.
>>
>> I also worked about frontend, middleware, backed development via
>> enterprise java technologies. Furthermore, experienced “Web Technologies”,
>> "Search Technologies", "Cloud Computing", "Distributed Systems" and "Big
>> Data". I took place in search engine project that Apache technologies were
>> used such as  Solr, HBase, Hadoop, Nutch, Gora and I used Nutch project
>> actively in this project. You can see more information on my linkedin
>> profile[1] about me.
>>
>> I mention some information for my process. My subject is "Nutch-1741 -
>> Support of Sitemaps in Nutch 2.x" [2] .You know that the url’s can be
>> got from only pages that were scanned before in nutch crawler system. Also,
>> the degrees of importance and “change frequence” of these urls are not
>> known only guessed. But, it is possible to find the whole of urls in a
>> up-to-date sitemap file. For this reason, sitemap files in website should
>> be crawled.
>>
>> I have explained the features for this project on my proposal. I’ll add
>> it to wiki and you can see details of it on wiki at when I share . You can
>> see nutch sitemap lifecycle the drawing [3].
>>
>> [1] https://tr.linkedin.com/in/cihadguzel
>>
>> [2] https://issues.apache.org/jira/browse/NUTCH-1741
>>
>> [3]
>> https://issues.apache.org/jira/secure/attachment/12707721/SitemapCrawlerLifeCycle.pdf
>>
>> Kind Regards
>>
>>
>> 2015-05-19 1:16 GMT+03:00 Cihad Guzel <[email protected]>:
>>
>>> Ok Lewis,
>>> I signed up to wiki, my wiki username: cihadguzel
>>>
>>> Thanks
>>>
>>> 2015-05-18 23:44 GMT+03:00 Lewis John Mcgibbney <
>>> [email protected]>:
>>>
>>>> Fantastic Cihad,
>>>> Thank you for introducing yourself.
>>>> As you are in the community bonding period right now, please feel free
>>>> to provide your wiki username to me and I will grant you access to the 
>>>> wiki.
>>>> Please also feel free to pick up some lingering issues for Nutch 2.3.1
>>>>
>>>> https://issues.apache.org/jira/browse/NUTCH-1945?jql=project%20%3D%20NUTCH%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%202.3.1%20ORDER%20BY%20priority%20DESC
>>>> Thanks
>>>> Lewis
>>>>
>>>>
>>>> On Mon, May 18, 2015 at 1:26 PM, Cihad Guzel <[email protected]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I had applied the GSoC 2015 for Apache Nutch Project and my
>>>>> application is accepted. The main reason why I have choosen the Nutch
>>>>> Project for GSOC is knowing the Nutch closely. My subject is "Nutch-1741 -
>>>>> Support of Sitemaps in Nutch 2.x"[1] . Thanks Lewis John McGibbney and
>>>>> Talat Uyarer for being my mentors on this process. I hope I can contribute
>>>>> to this project.
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/NUTCH-1741
>>>>>
>>>>> Kind Regards
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Lewis*
>>>>
>>>
>>>
>>
>

Re: Nutch-1741 in GSOC 2015

Reply via email to