Hi Talat, Comments below :
NUTCH-1753 Eclipse dependecy problem for 2.x > => trivial, please see my comments on it > NUTCH-1748 urlfilter-validator to allow .. (two dots) inside file names > (path elements) > => still under discussion - leave it for 2.4 > NUTCH-1740 BatchId parameter is not set in DbUpdaterJob > => duplicate > NUTCH-1728 indexer-solr plugin is not delete docs from solr > => trivial enough to be committed for 2.3 > NUTCH-1725 CleaningJob's reducer does not commit deleted docs. > => trivial enough to be committed for 2.3 > NUTCH-1662 NUTCH-1568 Indexer Plugin for Solr Cloud > => I think we did something pretty similar in 1.x and would like to make sure that both versions are as similar as possible. > NUTCH-1661 Language based crawling > => This is definitely not being committed. You haven't replied to Otis's questions and this has to be properly reviewed first and discussed. > NUTCH-1660 Index filter for Page's latitude and longitude > => same. You haven't replied to the comments on this one. > NUTCH-1657 ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION never > set in HTMLParser > => trivial indeed, +1 thanks > NUTCH-1643 Unnecessary fetching with http.content.limit when using > protocol-http > => needs reviewing first, let's leave it for later > NUTCH-1618 Fetches some websites multiple times for long lasting queues > => trivial indeed, please change the title to something more explicit like "Turn speculative execution off for Fetching" I have added NUTCH-1679 <https://issues.apache.org/jira/browse/NUTCH-1679> (UpdateDb using batchId, link may override crawled page.) to 2.3 as it must be fixed ASAP. Thanks for pointing out these issues. I think the focus for 2.3 should be to get everything as robust as possible, we can always add new functionalities in another release after that ("release often" etc...). One thing we should definitely have though is to leverage the brand new GORA filtering so that we get only the entries marked for a given job - see discussion on NUTCH-1714 <ttps://issues.apache.org/jira/browse/NUTCH-1714>. This should make Nutch 2.x a lot faster. We haven't released 2.x for some time and loads of interesting stuff has been done to it. It will be an exciting release! Thanks for your contributions and pushing things forward! Julien > > 2014-05-01 11:32 GMT+03:00 Julien Nioche <[email protected]>: > > Hi Talat >> >> Not clear what you mean here. "I need them" is not really an explanation >> as to why they should be part of the next release. [If you want your own >> repository then open an account on GitHub (or somewhere else) and clone the >> 2.x branch to add the patches of your choice]. >> >> Lewis suggested a roadmap for the next release and the changes he made >> reflect his suggestions. If you think some of the issues should be part of >> the 2.3 release then please explain why. BTW I don't think you agree with >> me as I was suggesting we stick to the ones already listed minus 1741. >> >> Thanks >> >> Julien >> >> >> >> On 1 May 2014 08:40, Talat Uyarer <[email protected]> wrote: >> >>> I aggree with you Julien. Today Lewis change some issues's fix version >>> 2.3 to 2.4. Most of my issues :) May I ask, If I update these issues, can >>> I change fix version to 2.3 ? I need them. >>> >>> Thanks >>> Talat >>> >>> >>> 2014-05-01 9:47 GMT+03:00 Julien Nioche <[email protected]>: >>> >>> I'd exclude NUTCH-1741 for now and focus on the core updates (GORA, >>>> filters, etc...). See comments on >>>> NUTCH-1714<https://issues.apache.org/jira/browse/NUTCH-1714> >>>> >>>> >>>> On 1 May 2014 07:27, Lewis John Mcgibbney <[email protected]>wrote: >>>> >>>>> Hi Alparslan & Folks, >>>>> >>>>> OK so you can see the road map's here >>>>> >>>>> *http://s.apache.org/Xqk* <http://s.apache.org/Xqk> >>>>> >>>>> As you can see in 2.3 development drive we've addressed 66 of 71 >>>>> issues. The remainders being as follows >>>>> >>>>> NUTCH-1741 <https://issues.apache.org/jira/browse/NUTCH-1741> >>>>> >>>>> Support of Sitemaps in Nutch >>>>> 2.x<https://issues.apache.org/jira/browse/NUTCH-1741> >>>>> NUTCH-1714 <https://issues.apache.org/jira/browse/NUTCH-1714> >>>>> >>>>> Nutch 2.x upgrade to Gora >>>>> 0.4<https://issues.apache.org/jira/browse/NUTCH-1714> >>>>> NUTCH-1709 <https://issues.apache.org/jira/browse/NUTCH-1709> >>>>> >>>>> Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus >>>>> contain methods not defined in source >>>>> .avsc<https://issues.apache.org/jira/browse/NUTCH-1709> >>>>> NUTCH-1674 <https://issues.apache.org/jira/browse/NUTCH-1674> >>>>> >>>>> Use batchId filter to enable scan (GORA-119) for >>>>> Fetch,Parse,Update,Index<https://issues.apache.org/jira/browse/NUTCH-1674> >>>>> NUTCH-1570 <https://issues.apache.org/jira/browse/NUTCH-1570> >>>>> >>>>> Add filtering capability to Datastore >>>>> Queries<https://issues.apache.org/jira/browse/NUTCH-1570> >>>>> I think if we addressed the above then we could push an RC. >>>>> Any comments? >>>>> I'll be able to crack on with this final push relatively soon. >>>>> >>>>> On Tue, Apr 29, 2014 at 1:09 PM, <[email protected]>wrote: >>>>> >>>>>> >>>>>> I think we can also add >>>>>> https://issues.apache.org/jira/browse/NUTCH-1674. This issue was >>>>>> waiting the stable release of gora-0.4. >>>>>> >>>>>> And IMHO, we can add https://issues.apache.org/jira/browse/NUTCH-1741, >>>>>> if anyone could review and test it. >>>>>> >>>>>> Thanks, >>>>>> Alparslan >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> Open Source Solutions for Text Engineering >>>> >>>> http://digitalpebble.blogspot.com/ >>>> http://www.digitalpebble.com >>>> http://twitter.com/digitalpebble >>>> >>> >>> >>> >>> -- >>> Talat UYARER >>> Websitesi: http://talat.uyarer.com >>> Twitter: http://twitter.com/talatuyarer >>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>> >> >> >> >> -- >> >> Open Source Solutions for Text Engineering >> >> http://digitalpebble.blogspot.com/ >> http://www.digitalpebble.com >> http://twitter.com/digitalpebble >> > > > > -- > Talat UYARER > Websitesi: http://talat.uyarer.com > Twitter: http://twitter.com/talatuyarer > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

