Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "RunNutchInEclipse" page has been changed by SebastianNagel:
http://wiki.apache.org/nutch/RunNutchInEclipse?action=diff&rev1=37&rev2=38

Comment:
using Java remote debugger, debugging and timeouts

  Generator$Selector [line: 108] - map
  OutlinkExtractor [line: 71 & 74] - getOutlinks
  }}}
+ 
+ === Remote Debugging in Eclipse ===
+  1. create a new Debug Configuration as 
[[http://help.eclipse.org/juno/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Ftasks%2Ftask-remotejava_launch_config.htm|Remote
 Java Application]] and remember the port (here: 37649)
+  1. launch nutch from command-line but add options to use the 
[[http://docs.oracle.com/javase/6/docs/technotes/guides/jpda/architecture.html#jdwp|Java
 Debugger JDWP Agent Library]], e.g. from bash:
+ {{{
+ % export 
NUTCH_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=localhost:37649"
+ % $NUTCH_HOME/bin/nutch parsechecker http://myurl.com/
+ }}}
+  1.#3 the application will be suspended just after launch
+  1. now go to Eclipse, set appropriate break-points, and run the previously 
created Debug Configuration
+ Instead of creating an extra launch configuration for every tool you want to 
debug, one single configuration is enough to debug any tool (parsechecker, 
indexchecher, URL filter, etc.) and that even remotely (crawler/tool running on 
server, Eclipse debugger locally).
+ 
+ === Debugging and Timeouts ===
+ Debugging takes time, esp. when inspecting variables, stack traces, etc. 
Usually too much time, so that some timeout will apply and stop the 
application. Set timeouts in the nutch-site.xml used for debugging to a rather 
high value (or -1 for unlimited), e.g., when debugging the parser:
+ {{{
+ <property>
+   <name>parser.timeout</name>
+   <value>-1</value>
+ </property>
+ }}}
+ 
  == If things do not work... ==
  Yes, Nutch and Eclipse can be a difficult companionship sometimes ;-)
  

Reply via email to