http://gwt-code-reviews.appspot.com/70802/diff/1/7
File eclipse/tools/simple-crawler/.project (right):

http://gwt-code-reviews.appspot.com/70802/diff/1/7#newcode1
Line 1: <?xml version="1.0" encoding="UTF-8"?>
For some reason I can't import this into eclipse successfully.  You
might want to see if it is importable for you into a clean eclipse
instance.

http://gwt-code-reviews.appspot.com/70802/diff/1/2
File tools/simple-crawler/src/com/google/gwt/crawler/Settings.java
(right):

http://gwt-code-reviews.appspot.com/70802/diff/1/2#newcode92
Line 92: + " requires an argument");
No need for line break

http://gwt-code-reviews.appspot.com/70802/diff/1/2#newcode119
Line 119: next_argument : while (!remainingArguments.isEmpty()) {
Wow, slick java syntax

http://gwt-code-reviews.appspot.com/70802/diff/1/2#newcode127
Line 127: if ((settings.sitemap.get() == null) &&
(settings.initUrl.get() == null)) {
Add a blank line before and a comment describing the restrictions being
enforced

http://gwt-code-reviews.appspot.com/70802/diff/1/2#newcode160
Line 160: private List<Setting<?>> allSettings;
Initialize to new ArrayList<Setting<?>> here

http://gwt-code-reviews.appspot.com/70802/diff/1/2#newcode165
Line 165: }
Remove this in favor of initializer above

http://gwt-code-reviews.appspot.com/70802/diff/1/3
File tools/simple-crawler/src/com/google/gwt/crawler/SimpleCrawler.java
(right):

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode16
Line 16: package com.google.gwt.crawler;
I would suggest something like com.google.gwt.tools.simplecrawler so it
doesn't sound overly official

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode37
Line 37: * PLEASE READ, THIS IS NOT THE USUAL YADI YADI.
More commonly spelled "yada yada" or "yadda yadda"

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode43
Line 43: * is initialilzed either by one web site or by a sitemap, and
it follows
initialized
"one web site" -> "a uri" ?

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode51
Line 51: * queue of pages yet to be crawled
Style: capitalize and end with a period (same for comments below)

Why is this public (same comment below) -- use the most private scope
that gets the job done -- if access is needed from outside this class,
consider accessor methods instead

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode57
Line 57: public static Set<String> alreadySeenSet = new
TreeSet<String>();
TreeSet is typically slower than HashSet.  Do you rely on a particular
ordering?

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode61
Line 61: public static PrintWriter out;
!public
This could be initialized to new PrintWriter(System.out, true) right
here.

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode69
Line 69: public static String getUrlContent(String urlString) {
!public

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode74
Line 74: urlInputStream = url.openStream();
Collapse two line above into one line

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode80
Line 80: content = contentScanner.useDelimiter("\\A").next();
kewl

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode87
Line 87: return null;
Print something and make the exception/stack trace available rather than
dying silently

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode92
Line 92: * entry point
Caps & period

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode95
Line 95: SAXException, IOException {
This never actually throws IOException

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode118
Line 118: System.err.println("Could not open sitemap file: " +
e.getMessage());
Wrap this try/catch more tightly around the code that can trigger it,
i.e., the call to parseSitemap.  Otherwise some future revision could
create the opportunity for another IOException in the "big" try/catch
block, and it would be incorrect to generate such a a specific error
message here

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode124
Line 124: * creates a full Url from a hash fragment
Caps & period

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode130
Line 130: String toReturn;
Remove -- this is not C!

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode131
Line 131: String baseUrl = nextUrl.replaceAll("\\?.*", "");
Comment what is happening here (removing trailing ? stuff)

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode136
Line 136: toReturn = baseUrl + hashFragment;
return baseUrl + hashFragment;

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode141
Line 141: * Map to crawler Url, which in short means #@: will be mapped
to
Stray colon?

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode147
Line 147: public static String mapToCrawlerUrl(String url) {
Comment how this works, with examples

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode165
Line 165: public static String mapToOriginalUrl(String url) {
Comment how this works, with examples

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode174
Line 174: * gets the content for all the Urls on the queue, extracts
links from it, and
Caps & period

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode198
Line 198: out.println(nextUrlContent);
This could get pretty verbose

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode199
Line 199: String hrefPattern = "href=\"(.*)\"";
What if the href appears in a comment, or in non-tag text?
Also, does this match the shortest or longest pattern?  If not
guaranteed to be the shortest, you need to ensure the .* part excludes
quotation marks, e.g., [^"]*

http://gwt-code-reviews.appspot.com/70802/diff/1/3#newcode200
Line 200: Matcher matcher =
Pattern.compile(hrefPattern).matcher(nextUrlContent);
Pattern.compile should be called in a static initializer since the
pattern never changes.  It's a slow operation that should definitely not
be in the inner loop of your app.

http://gwt-code-reviews.appspot.com/70802/diff/1/4
File
tools/simple-crawler/src/com/google/gwt/crawler/SimpleSitemapParser.java
(right):

http://gwt-code-reviews.appspot.com/70802/diff/1/4#newcode37
Line 37: public class SimpleSitemapParser {
Document the file format, with an example.

http://gwt-code-reviews.appspot.com/70802/diff/1/4#newcode63
Line 63: private static DefaultHandler parseSiteMap(final Queue<String>
urlQueue) {
Should rename, since this method doesn't actually parse, but instead
returns an object that will be used for parsing later

http://gwt-code-reviews.appspot.com/70802/diff/1/4#newcode65
Line 65:
Delete empty line

http://gwt-code-reviews.appspot.com/70802/diff/1/4#newcode86
Line 86:
Delete empty line

http://gwt-code-reviews.appspot.com/70802

--~--~---------~--~----~------------~-------~--~----~
http://groups.google.com/group/Google-Web-Toolkit-Contributors
-~----------~----~----~----~------~----~------~--~---

Reply via email to