Yes please share. It would be useful.
On Sep 25, 2014 8:54 PM, "Talat Uyarer" <[email protected]> wrote:

> Last thing I wrote a how to use it document. :)
> On Sep 26, 2014 6:52 AM, "Talat Uyarer" <[email protected]> wrote:
>
>> Hi all,
>>
>> I made some changes Emir's plugin for completable with 2.x That is useful
>> If you need I can share my fork.
>>
>> Talat
>> On Sep 26, 2014 6:47 AM, "Nima Falaki" <[email protected]> wrote:
>>
>>> Hi:
>>>
>>> Yes, it would be very interesting. Let me know what Emir says
>>>
>>> Nima
>>>
>>> On Thu, Sep 25, 2014 at 12:43 PM, Albinscode <[email protected]>
>>> wrote:
>>>
>>>> Oh thanks Nima, I did found this topic last year but I thought the
>>>> project was dead. I think there is a little reference in the nutch wiki too
>>>> I cannot find it now.
>>>>
>>>> It looks like we have the same xsl approach so it can be interesting to
>>>> share. I'll try to contact Emir while continuing documenting my small
>>>> plugin.
>>>>
>>>> Thanks again for the valuable information!
>>>>
>>>> 2014-09-25 19:19 GMT+02:00 Nima Falaki <[email protected]>:
>>>>
>>>>> And the reason why I think this is because of this ticket (Look at the
>>>>> conversation at the bottom between Emmanuel and Lewis John)
>>>>>
>>>>> https://issues.apache.org/jira/browse/NUTCH-978
>>>>>
>>>>> On Thu, Sep 25, 2014 at 8:44 AM, Nima Falaki <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Julien:
>>>>>>
>>>>>> I was under the impression that the nutch community was going to use
>>>>>> a generic xls parser? This one.
>>>>>> http://www.atlantbh.com/precise-data-extraction-with-apache-nutch/
>>>>>> Is the nutch community going to use this?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 25, 2014 at 5:49 AM, Julien Nioche <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Albin,
>>>>>>>
>>>>>>> You don't have to have a separate plugin for each html structure you
>>>>>>> want to parse. You can have a single plugin with multiple 
>>>>>>> HTMLParseFilters.
>>>>>>>
>>>>>>> Having a generic extractor with the extraction logic configured in
>>>>>>> an external file is definitely a good idea and would make a great
>>>>>>> contribution to the project. In a nutshell, you haven't missed anything 
>>>>>>> and
>>>>>>> that wheel definitely needs inventing ;-)
>>>>>>>
>>>>>>> Best
>>>>>>>
>>>>>>> Julien
>>>>>>>
>>>>>>>
>>>>>>> On 25 September 2014 09:24, Albin Vigier <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello everybody,
>>>>>>>>
>>>>>>>> I'm just wondering if it is possible to fetch specific metadata with
>>>>>>>> an existing nutch plugin.
>>>>>>>>
>>>>>>>> Let's take an example.
>>>>>>>> I want to extract some metadata from "div" or "td" tags from html
>>>>>>>> pages that have specific ids and name them the way I like (this is
>>>>>>>> done at parser time).
>>>>>>>> Then, at indexer time, I would use index-metadata (a very good
>>>>>>>> plugin)
>>>>>>>> to add my custom metadata.
>>>>>>>>
>>>>>>>> Currently from what I've seen on the wiki and by quickly analyzing
>>>>>>>> plugins I suppose I have to code my own plugin each time I've got a
>>>>>>>> new site (with a new html structure). I've already done that by
>>>>>>>> using
>>>>>>>> a node walker in a custom htmlParseFilter but the extraction can be
>>>>>>>> a
>>>>>>>> little bit boring :)
>>>>>>>>
>>>>>>>> So on my side i've coded a little plugin that enables me to specify
>>>>>>>> xpaths in an xml file. But before diving into more functionalities
>>>>>>>> I'm
>>>>>>>> just wondering if I did not missed something.
>>>>>>>> This work allowed me to explore some nutch aspects but I don't want
>>>>>>>> to
>>>>>>>> reinvent the wheel or miss something.
>>>>>>>>
>>>>>>>> Albin
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Open Source Solutions for Text Engineering
>>>>>>>
>>>>>>> http://digitalpebble.blogspot.com/
>>>>>>> http://www.digitalpebble.com
>>>>>>> http://twitter.com/digitalpebble
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>> Nima Falaki
>>>>>> Software Engineer
>>>>>> [email protected]
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>> Nima Falaki
>>>>> Software Engineer
>>>>> [email protected]
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>> Nima Falaki
>>> Software Engineer
>>> [email protected]
>>>
>>>

Reply via email to