Tim Starling has uploaded a new change for review. https://gerrit.wikimedia.org/r/235669
Change subject: Use /document as the path instead of /depurate ...................................................................... Use /document as the path instead of /depurate This is conceived as a format specifier, for forwards compatibility with other output formats. Also add a README.md file for documentation. Change-Id: I31f060d0025cf7f1a0393d7940d0a7f31e9ccb3a --- A README.md M src/main/java/org/wikimedia/html5depurate/DepurateDaemon.java M src/main/java/org/wikimedia/html5depurate/DepurateHandler.java 3 files changed, 60 insertions(+), 1 deletion(-) git pull ssh://gerrit.wikimedia.org:29418/mediawiki/services/html5depurate refs/changes/69/235669/1 diff --git a/README.md b/README.md new file mode 100644 index 0000000..cb6ac6b --- /dev/null +++ b/README.md @@ -0,0 +1,53 @@ +This is an HTTP frontend for the validator.nu HTML 5 parser. It parses some +input text and returns the reserialized HTML. + +## Compile and test + +Ubuntu build/test dependencies: +* openjdk-7-jdk +* maven2 +* jsvc + +Compile with `mvn compile`. Then `mvn dependency:build-classpath` will display +a classpath suitable for testing. Then the daemon can be started with something +like: + +``` +/usr/bin/jsvc \ + -cp "$classpath":target/classes \ + -pidfile /tmp/html5depurate.pid \ + -errfile /tmp/html5depurate.err \ + -outfile /tmp/html5depurate.out \ + -procname html5depurate \ + org.wikimedia.html5depurate.DepurateDaemon +``` + +The default log format is pretty bad but can be configured by the usual means, +with -Djava.util.logging.config.file=/path/to/logging.properties + +Then to test: + +``` +curl http://localhost:4339/document -F text=foo +``` + +This will return an HTML document which is a reserialized version of "foo". + +## To do + +* Debian packaging + - A SysV init script wrapping jsvc should be fairly simple. + - Very strong security guarantees are possible by using a security.policy + file. + - Most Maven dependencies are packaged already, with the exception of the + validator.nu parser itself, which needs to be bundled. + +* Collect warnings/errors and provide a JSON serialized return format + exposed at /info. + +* Help out MW a bit by extracting the contents of the body tag. This could be + provided at /body. + +* A servlet version, if someone needs that. An early version depended on a + servlet container, but I abandoned that approach in favour of the robustness + and management simplicity of a standalone daemon. diff --git a/src/main/java/org/wikimedia/html5depurate/DepurateDaemon.java b/src/main/java/org/wikimedia/html5depurate/DepurateDaemon.java index a171c7b..a48d486 100644 --- a/src/main/java/org/wikimedia/html5depurate/DepurateDaemon.java +++ b/src/main/java/org/wikimedia/html5depurate/DepurateDaemon.java @@ -96,7 +96,7 @@ new NetworkListener("depurate", config.host, config.port)); ServerConfiguration serverConf = m_server.getServerConfiguration(); - serverConf.addHttpHandler(new DepurateHandler(config), "/depurate"); + serverConf.addHttpHandler(new DepurateHandler(config), "/document", "/body"); serverConf.setDefaultErrorPageGenerator(new DepurateErrorPageGenerator()); serverConf.setName("depurate"); m_server.start(); diff --git a/src/main/java/org/wikimedia/html5depurate/DepurateHandler.java b/src/main/java/org/wikimedia/html5depurate/DepurateHandler.java index 40ca68c..69fbed0 100644 --- a/src/main/java/org/wikimedia/html5depurate/DepurateHandler.java +++ b/src/main/java/org/wikimedia/html5depurate/DepurateHandler.java @@ -24,6 +24,7 @@ final private Config m_config; Logger m_logger = Logger.getLogger(this.getClass().getName()); + DepurateHandler(Config config) { super("depurate"); m_config = config; @@ -35,6 +36,11 @@ { m_logger.finer("Request received"); + if (!request.getContextPath().equals("/document")) { + sendError(response, 404, "Only /document is supported"); + return; + } + response.suspend(); request.setCharacterEncoding("UTF-8"); final MultipartBuffer buf = new MultipartBuffer(m_config.maxPostSize); -- To view, visit https://gerrit.wikimedia.org/r/235669 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I31f060d0025cf7f1a0393d7940d0a7f31e9ccb3a Gerrit-PatchSet: 1 Gerrit-Project: mediawiki/services/html5depurate Gerrit-Branch: master Gerrit-Owner: Tim Starling <[email protected]> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
