[
https://issues.apache.org/jira/browse/NUTCH-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235500#comment-15235500
]
ASF GitHub Bot commented on NUTCH-2248:
---------------------------------------
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/102#discussion_r59242237
--- Diff:
src/plugin/parse-css/src/java/org/apache/nutch/parse/css/CssParser.java ---
@@ -0,0 +1,225 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.nutch.parse.css;
+
+import java.io.*;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.List;
+
+import com.steadystate.css.parser.CSSOMParser;
+import com.steadystate.css.parser.SACParserCSS3;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.nutch.metadata.Metadata;
+import org.apache.nutch.protocol.Content;
+import org.apache.nutch.parse.*;
+import org.apache.nutch.util.*;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.w3c.css.sac.CSSParseException;
+import org.w3c.css.sac.InputSource;
+import org.w3c.dom.css.*;
+
+
+public class CssParser implements org.apache.nutch.parse.Parser {
+ public static final Logger LOG =
LoggerFactory.getLogger(CssParser.class);
+
+ /**
+ * Suppresses warnings, logs all errors, and throws fatal errors
+ */
+ private class ErrorHandler implements org.w3c.css.sac.ErrorHandler {
+ @Override
+ public void warning(CSSParseException exception) { }
+
+ @Override
+ public void error(CSSParseException exception) {
+ LOG.debug("CSS parser error: " + exception.getMessage());
--- End diff --
Can you use ```LOG.debug("CSS parser error: {}", exception.getMessage());```
> CSS parser plugin
> -----------------
>
> Key: NUTCH-2248
> URL: https://issues.apache.org/jira/browse/NUTCH-2248
> Project: Nutch
> Issue Type: New Feature
> Components: parser, plugin
> Affects Versions: 1.12
> Reporter: Joseph Naegele
>
> This plugin allows for collecting {{uri}} links from CSS (stylesheets). This
> is useful for collecting parent stylesheets, fonts, and images needed to
> display web pages as intended.
> Parsed Outlinks do not have associated anchors, and no additional
> text/content is parsed from the stylesheet.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)