Thank you Julien. I agree with you about we should do as robust as possible releases. I work on your comments.
Talat 2014-05-01 19:32 GMT+03:00 Julien Nioche <[email protected]>: > Hi Talat, > > Comments below : > > NUTCH-1753 Eclipse dependecy problem for 2.x >> > > => trivial, please see my comments on it > > >> NUTCH-1748 urlfilter-validator to allow .. (two dots) inside file names >> (path elements) >> > > => still under discussion - leave it for 2.4 > > >> NUTCH-1740 BatchId parameter is not set in DbUpdaterJob >> > > => duplicate > > >> NUTCH-1728 indexer-solr plugin is not delete docs from solr >> > > => trivial enough to be committed for 2.3 > > >> NUTCH-1725 CleaningJob's reducer does not commit deleted docs. >> > > => trivial enough to be committed for 2.3 > > >> NUTCH-1662 NUTCH-1568 Indexer Plugin for Solr Cloud >> > > => I think we did something pretty similar in 1.x and would like to make > sure that both versions are as similar as possible. > > >> NUTCH-1661 Language based crawling >> > > => This is definitely not being committed. You haven't replied to Otis's > questions and this has to be properly reviewed first and discussed. > > >> NUTCH-1660 Index filter for Page's latitude and longitude >> > > => same. You haven't replied to the comments on this one. > > >> NUTCH-1657 ORIGINAL_CHAR_ENCODING and CHAR_ENCODING_FOR_CONVERSION never >> set in HTMLParser >> > > => trivial indeed, +1 thanks > > >> NUTCH-1643 Unnecessary fetching with http.content.limit when using >> protocol-http >> > > => needs reviewing first, let's leave it for later > > >> NUTCH-1618 Fetches some websites multiple times for long lasting queues >> > > => trivial indeed, please change the title to something more explicit like > "Turn speculative execution off for Fetching" > > I have added NUTCH-1679 <https://issues.apache.org/jira/browse/NUTCH-1679> > (UpdateDb using batchId, link may override crawled page.) to 2.3 as it > must be fixed ASAP. > > Thanks for pointing out these issues. I think the focus for 2.3 should be > to get everything as robust as possible, we can always add new > functionalities in another release after that ("release often" etc...). One > thing we should definitely have though is to leverage the brand new GORA > filtering so that we get only the entries marked for a given job - see > discussion on NUTCH-1714. This should make Nutch 2.x a lot faster. > > We haven't released 2.x for some time and loads of interesting stuff has > been done to it. It will be an exciting release! > > Thanks for your contributions and pushing things forward! > > Julien > > > >> >> 2014-05-01 11:32 GMT+03:00 Julien Nioche <[email protected]>: >> >> Hi Talat >>> >>> Not clear what you mean here. "I need them" is not really an explanation >>> as to why they should be part of the next release. [If you want your own >>> repository then open an account on GitHub (or somewhere else) and clone the >>> 2.x branch to add the patches of your choice]. >>> >>> Lewis suggested a roadmap for the next release and the changes he made >>> reflect his suggestions. If you think some of the issues should be part of >>> the 2.3 release then please explain why. BTW I don't think you agree with >>> me as I was suggesting we stick to the ones already listed minus 1741. >>> >>> Thanks >>> >>> Julien >>> >>> >>> >>> On 1 May 2014 08:40, Talat Uyarer <[email protected]> wrote: >>> >>>> I aggree with you Julien. Today Lewis change some issues's fix version >>>> 2.3 to 2.4. Most of my issues :) May I ask, If I update these issues, can >>>> I change fix version to 2.3 ? I need them. >>>> >>>> Thanks >>>> Talat >>>> >>>> >>>> 2014-05-01 9:47 GMT+03:00 Julien Nioche <[email protected]> >>>> : >>>> >>>> I'd exclude NUTCH-1741 for now and focus on the core updates (GORA, >>>>> filters, etc...). See comments on >>>>> NUTCH-1714<https://issues.apache.org/jira/browse/NUTCH-1714> >>>>> >>>>> >>>>> On 1 May 2014 07:27, Lewis John Mcgibbney >>>>> <[email protected]>wrote: >>>>> >>>>>> Hi Alparslan & Folks, >>>>>> >>>>>> OK so you can see the road map's here >>>>>> >>>>>> *http://s.apache.org/Xqk* <http://s.apache.org/Xqk> >>>>>> >>>>>> As you can see in 2.3 development drive we've addressed 66 of 71 >>>>>> issues. The remainders being as follows >>>>>> >>>>>> NUTCH-1741 <https://issues.apache.org/jira/browse/NUTCH-1741> >>>>>> >>>>>> Support of Sitemaps in Nutch >>>>>> 2.x<https://issues.apache.org/jira/browse/NUTCH-1741> >>>>>> NUTCH-1714 <https://issues.apache.org/jira/browse/NUTCH-1714> >>>>>> >>>>>> Nutch 2.x upgrade to Gora >>>>>> 0.4<https://issues.apache.org/jira/browse/NUTCH-1714> >>>>>> NUTCH-1709 <https://issues.apache.org/jira/browse/NUTCH-1709> >>>>>> >>>>>> Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus >>>>>> contain methods not defined in source >>>>>> .avsc<https://issues.apache.org/jira/browse/NUTCH-1709> >>>>>> NUTCH-1674 <https://issues.apache.org/jira/browse/NUTCH-1674> >>>>>> >>>>>> Use batchId filter to enable scan (GORA-119) for >>>>>> Fetch,Parse,Update,Index<https://issues.apache.org/jira/browse/NUTCH-1674> >>>>>> NUTCH-1570 <https://issues.apache.org/jira/browse/NUTCH-1570> >>>>>> >>>>>> Add filtering capability to Datastore >>>>>> Queries<https://issues.apache.org/jira/browse/NUTCH-1570> >>>>>> I think if we addressed the above then we could push an RC. >>>>>> Any comments? >>>>>> I'll be able to crack on with this final push relatively soon. >>>>>> >>>>>> On Tue, Apr 29, 2014 at 1:09 PM, <[email protected]>wrote: >>>>>> >>>>>>> >>>>>>> I think we can also add >>>>>>> https://issues.apache.org/jira/browse/NUTCH-1674. This issue was >>>>>>> waiting the stable release of gora-0.4. >>>>>>> >>>>>>> And IMHO, we can add >>>>>>> https://issues.apache.org/jira/browse/NUTCH-1741, if anyone could >>>>>>> review and test it. >>>>>>> >>>>>>> Thanks, >>>>>>> Alparslan >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Open Source Solutions for Text Engineering >>>>> >>>>> http://digitalpebble.blogspot.com/ >>>>> http://www.digitalpebble.com >>>>> http://twitter.com/digitalpebble >>>>> >>>> >>>> >>>> >>>> -- >>>> Talat UYARER >>>> Websitesi: http://talat.uyarer.com >>>> Twitter: http://twitter.com/talatuyarer >>>> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >>>> >>> >>> >>> >>> -- >>> >>> Open Source Solutions for Text Engineering >>> >>> http://digitalpebble.blogspot.com/ >>> http://www.digitalpebble.com >>> http://twitter.com/digitalpebble >>> >> >> >> >> -- >> Talat UYARER >> Websitesi: http://talat.uyarer.com >> Twitter: http://twitter.com/talatuyarer >> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >> > > > > -- > > Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble > -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

