Hi Cihad,
I'll take a look tonight.
My understanding is that this would be implemented as part of core and not
as a plugin. Within the plugin we can, at time, have acesss to less verbose
data structures. This is of course not always the case, but generally
speaking we see more issues, depending on which interfaces we extend, with
appropriate access to the correct data structures. We then have the issue
of dependency management.
I'll have a look through the various links you have sent and then write
back here in due course.
Apologies about the delay.
Thanks

On Mon, Jul 6, 2015 at 12:20 AM, Cihad Guzel <[email protected]> wrote:

> Hi,
>
> I have find a patch for my metadata problem [1]. But , the problem isn't
> solved for 2.x [2]. I guess, I need to solve it.
>
> [1] https://issues.apache.org/jira/browse/NUTCH-1622
> [2] https://issues.apache.org/jira/browse/NUTCH-1816
>
> 2015-07-04 15:56 GMT+03:00 Cihad Guzel <[email protected]>:
>
>> Hi Lewis,
>>
>> I and Talat talk about architecture for sitemap supporting . We thought
>> the problem could be solved in nutch life cycle . We don't want to build a
>> different life cycle for sitemap crawling.
>>
>> So, I have some problems as following:
>>
>> If the sitemap file is too large size, it can not be fetched and parsed.
>> It gets timeout. I solved timeout problem temporarily to parse by raising
>> the value of timeout in nutch-site.xml and to fetch by working small size
>> file. It is not good.
>>
>> Moreover, you know sitemap files have some special tags as "loc",
>> "lastmod", "changefreq" or "priority". It has been parsed using my parse
>> plugin. I want to  record to crawldb, but the Parse  object doesn't
>> support metadata or same fields. It has only outlink array. It isn't enough
>> for recording metadata.
>>
>> I want to record each url in sitemap file with the metadata seperately.
>>
>> I viewed all patchs and comments from NUTCH-1465 and there are some
>> solution for same problems in it. But, new job for sitemap crawling have
>> been created.
>>
>> Could you show me a way out?
>>
>> Thanks.
>>
>
>


-- 
*Lewis*

Reply via email to