Yes please share. It would be useful. On Sep 25, 2014 8:54 PM, "Talat Uyarer" <[email protected]> wrote:
> Last thing I wrote a how to use it document. :) > On Sep 26, 2014 6:52 AM, "Talat Uyarer" <[email protected]> wrote: > >> Hi all, >> >> I made some changes Emir's plugin for completable with 2.x That is useful >> If you need I can share my fork. >> >> Talat >> On Sep 26, 2014 6:47 AM, "Nima Falaki" <[email protected]> wrote: >> >>> Hi: >>> >>> Yes, it would be very interesting. Let me know what Emir says >>> >>> Nima >>> >>> On Thu, Sep 25, 2014 at 12:43 PM, Albinscode <[email protected]> >>> wrote: >>> >>>> Oh thanks Nima, I did found this topic last year but I thought the >>>> project was dead. I think there is a little reference in the nutch wiki too >>>> I cannot find it now. >>>> >>>> It looks like we have the same xsl approach so it can be interesting to >>>> share. I'll try to contact Emir while continuing documenting my small >>>> plugin. >>>> >>>> Thanks again for the valuable information! >>>> >>>> 2014-09-25 19:19 GMT+02:00 Nima Falaki <[email protected]>: >>>> >>>>> And the reason why I think this is because of this ticket (Look at the >>>>> conversation at the bottom between Emmanuel and Lewis John) >>>>> >>>>> https://issues.apache.org/jira/browse/NUTCH-978 >>>>> >>>>> On Thu, Sep 25, 2014 at 8:44 AM, Nima Falaki <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Julien: >>>>>> >>>>>> I was under the impression that the nutch community was going to use >>>>>> a generic xls parser? This one. >>>>>> http://www.atlantbh.com/precise-data-extraction-with-apache-nutch/ >>>>>> Is the nutch community going to use this? >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 25, 2014 at 5:49 AM, Julien Nioche < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Albin, >>>>>>> >>>>>>> You don't have to have a separate plugin for each html structure you >>>>>>> want to parse. You can have a single plugin with multiple >>>>>>> HTMLParseFilters. >>>>>>> >>>>>>> Having a generic extractor with the extraction logic configured in >>>>>>> an external file is definitely a good idea and would make a great >>>>>>> contribution to the project. In a nutshell, you haven't missed anything >>>>>>> and >>>>>>> that wheel definitely needs inventing ;-) >>>>>>> >>>>>>> Best >>>>>>> >>>>>>> Julien >>>>>>> >>>>>>> >>>>>>> On 25 September 2014 09:24, Albin Vigier <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello everybody, >>>>>>>> >>>>>>>> I'm just wondering if it is possible to fetch specific metadata with >>>>>>>> an existing nutch plugin. >>>>>>>> >>>>>>>> Let's take an example. >>>>>>>> I want to extract some metadata from "div" or "td" tags from html >>>>>>>> pages that have specific ids and name them the way I like (this is >>>>>>>> done at parser time). >>>>>>>> Then, at indexer time, I would use index-metadata (a very good >>>>>>>> plugin) >>>>>>>> to add my custom metadata. >>>>>>>> >>>>>>>> Currently from what I've seen on the wiki and by quickly analyzing >>>>>>>> plugins I suppose I have to code my own plugin each time I've got a >>>>>>>> new site (with a new html structure). I've already done that by >>>>>>>> using >>>>>>>> a node walker in a custom htmlParseFilter but the extraction can be >>>>>>>> a >>>>>>>> little bit boring :) >>>>>>>> >>>>>>>> So on my side i've coded a little plugin that enables me to specify >>>>>>>> xpaths in an xml file. But before diving into more functionalities >>>>>>>> I'm >>>>>>>> just wondering if I did not missed something. >>>>>>>> This work allowed me to explore some nutch aspects but I don't want >>>>>>>> to >>>>>>>> reinvent the wheel or miss something. >>>>>>>> >>>>>>>> Albin >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Open Source Solutions for Text Engineering >>>>>>> >>>>>>> http://digitalpebble.blogspot.com/ >>>>>>> http://www.digitalpebble.com >>>>>>> http://twitter.com/digitalpebble >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> Nima Falaki >>>>>> Software Engineer >>>>>> [email protected] >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> Nima Falaki >>>>> Software Engineer >>>>> [email protected] >>>>> >>>>> >>>> >>> >>> >>> -- >>> >>> >>> >>> Nima Falaki >>> Software Engineer >>> [email protected] >>> >>>

