[ http://issues.apache.org/jira/browse/NUTCH-116?page=all ]
Paul Baclace updated NUTCH-116:
-------------------------------
Attachment: required_by_TestNDFS_v2.patch
Change Notes revised for patch required_by_TestNDFS_v2.patch which supercedes
required_by_TestNDFS.patch:
src/java/org/apache/nutch/ipc/Server.java
Set thread names to make it possible to view logging output and known when
proper shutdown is completed. Using the safer notifyAll() for Server.stop() and
Server.join() instead of
notify() since the wait condition is on a public object, added comments that
clarify
actual implementation.
Tightened the join() to be a proper while (running) { wait()} which avoids
the
hazard of "spurious wakeup" in Posix threads (as noted in Effective Java by
Joshua Bloch).
src/java/org/apache/nutch/ndfs/DataNode.java
improved logging details, added comments, improved error message,
refactored reuseable code into makeInstanceForDir(), added toString(),
added properties ndfs.blockreport.intervalMsec and ndfs.datanode.startupMsec
to allow the override of BLOCKREPORT_INTERVAL and DATANODE_STARTUP_PERIOD,
respectively, in order to speed up TestNDFS runs (otherwise it would take an
hour).
These FSConstant fields are worth keeping as default values when a property is
not set so that lookup idiom is:
conf.getLong("ndfs.datanode.startupMsec", DATANODE_STARTUP_PERIOD);
instead of:
conf.getLong("ndfs.datanode.startupMsec", 1000*60*10);
When a property lookup occurs in more than one place, it is best to have the
default value come from FSConstants rather than have multiple, possibly
different, literal values as the default.
src/java/org/apache/nutch/ndfs/FSDataset.java
added toString() methods used in logging elsewhere.
src/java/org/apache/nutch/ndfs/FSNamesystem.java
Changed chooseTarget() to behave as commented rather than as implemented (it
says it fobids picking a target on the same host, but it was using
host:port as the basis of comparison, so different ports on the same host
would appear to be different hosts; this mistake was probably the result of
DatanodeInfo.getName() really returning host:port, not just hostname which
is what the method name implies (DatanodeInfo.getHost() removes the port
number).
Added property test.ndfs.same.host.targets.allowed which allows target
datanode
selection to use same host (same host:port is never allowed.)
TestNDFS uses host:port comparison
and normal operation just uses 'host' to better distribute replicants;
simplified a chooseTarget() conditional which was redundantly
checking against forbidden1, forbidden2 and the just constructed
forbiddenMachines containing the union of forbidden1, forbidden2:
if ((forbidden1 == null || ! forbidden1.contains(node)) &&
(forbidden2 == null || ! forbidden2.contains(node)) &&
(! forbiddenMachines.contains(node.getName()))) {
The following:
forbidden1.contains(node) == forbiddenMachines.contains(node.getName())
is always true and uses host:port for the comparison.
Added logging for previously
silent errors, emit more info for some logging, change LOG.info() to
LOG.warning(), added javadoc comments,
src/java/org/apache/nutch/ndfs/NameNode.java
Added a way to stop the daemon for JUnit testing, added javadoc comments,
renames offerService() to join() to better indicate what the method
really does, added property ndfs.namenode.handler.count to adjust the
number of handlers to speed up testing, changed access of some fields
from package to private (protected is also reasonable) to quickly indicate
how it is self-contained when studying the code.
> TestNDFS a JUnit test specifically for NDFS
> -------------------------------------------
>
> Key: NUTCH-116
> URL: http://issues.apache.org/jira/browse/NUTCH-116
> Project: Nutch
> Type: Test
> Components: fetcher, indexer, searcher
> Versions: 0.8-dev
> Reporter: Paul Baclace
> Attachments: TestNDFS.java, required_by_TestNDFS.patch,
> required_by_TestNDFS_v2.patch
>
> TestNDFS is a JUnit test for NDFS using "pseudo multiprocessing" (or more
> strictly, pseudo distributed) meaning all daemons run in one process and
> sockets are used to communicate between daemons.
> The test permutes various block sizes, number of files, file sizes, and
> number of datanodes. After creating 1 or more files and filling them with
> random data, one datanode is shutdown, and then the files are verfified.
> Next, all the random test files are deleted and we test for leakage
> (non-deletion) by directly checking the real directories corresponding to the
> datanodes still running.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira