[
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1934:
----------------------------------------
Attachment: NUTCH-1934.patch
Patch for trunk.
Some early observations:
* Existing Nutch tests pass locally
* The way I have approach this is to make explicit casts to existing
fetchQueue objects as **FetcherThread** is now an independent Class. In my test
crawling, i have come across no ClassCastExceptions (as of yet!!!) however this
is something we should remain vigilant about e.g.
{code}
((FetchItemQueues) fetchQueues).getTotalSize()
{code}
* We now have pretty verbose constructor for **FetcherThread** (hey whats new
it's the Nutch Fetcher.java), however this is pretty verbose even by Nutch
Fetcher.java standards.
{code}
public FetcherThread(Configuration conf, AtomicInteger activeThreads,
FetchItemQueues fetchQueues,
QueueFeeder feeder, AtomicInteger spinWaiting, AtomicLong
lastRequestStart, Reporter reporter,
AtomicInteger errors, String segmentName, boolean parsing,
OutputCollector<Text, NutchWritable> output,
boolean storingContent, AtomicInteger pages, AtomicLong bytes) {
{code}
Some initial comments would be very helpful. Thanks
> Refactor Fetcher in trunk
> -------------------------
>
> Key: NUTCH-1934
> URL: https://issues.apache.org/jira/browse/NUTCH-1934
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Affects Versions: 1.10
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Attachments: NUTCH-1934.patch
>
>
> Put simply
> [Fetcher|https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java]
> is too big.
> This is kinda strange as the size of this file is unique (I think) from every
> other class within Nutch. The others are reasonably well modularized and
> split into constituent classes which make sense.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)