[ 
https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597393#comment-17597393
 ] 

ASF GitHub Bot commented on DRILL-8289:
---------------------------------------

pjfanning commented on code in PR #2634:
URL: https://github.com/apache/drill/pull/2634#discussion_r957792365


##########
contrib/udfs/src/main/java/org/apache/drill/exec/udfs/ThreatHuntingFunctions.java:
##########
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udfs;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+public class ThreatHuntingFunctions {
+  /**
+   * Punctuation pattern is useful for comparing log entries.  It extracts the 
all the punctuation and returns
+   * that pattern.  Spaces are replaced with an underscore.
+   * <p>
+   * Usage: SELECT punctuation_pattern( string ) FROM...
+   */
+  @FunctionTemplate(names = {"punctuation_pattern", "punctuationPattern"},
+    scope = FunctionTemplate.FunctionScope.SIMPLE,
+    nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class PunctuationPatternFunction implements DrillSimpleFunc {
+
+    @Param
+    VarCharHolder rawInput;
+
+    @Output
+    VarCharHolder out;
+
+    @Inject
+    DrillBuf buffer;
+
+    @Override
+    public void setup() {
+    }
+
+    @Override
+    public void eval() {
+
+      String input = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(rawInput.start,
 rawInput.end, rawInput.buffer);
+
+      String punctuationPattern = input.replaceAll("[a-zA-Z0-9]", "");
+      punctuationPattern = punctuationPattern.replaceAll(" ", "_");
+
+      out.buffer = buffer;
+      out.start = 0;
+      out.end = punctuationPattern.getBytes().length;

Review Comment:
   getBytes is safer if you specify a charset, otherwise you get the JVM 
default which differs from machine to machine (unless Drill startup shell 
scripts specify `-Dfile.encoding=...`)





> Add Threat Hunting Functions
> ----------------------------
>
>                 Key: DRILL-8289
>                 URL: https://issues.apache.org/jira/browse/DRILL-8289
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Functions - Drill
>    Affects Versions: 2.0.0
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 2.0.0
>
>
> # Threat Hunting Functions
> These functions are useful for doing threat hunting with Apache Drill. These 
> were inspired by huntlib.[1]
> The functions are: 
> * `punctuation_pattern(<string>)`: Extracts the pattern of punctuation in 
> text.
> * `entropy(<string>)`: This function calculates the Shannon Entropy of a 
> given string of text.
> * `entropyPerByte(<string>)`: This function calculates the Shannon Entropy of 
> a given string of text, normed for the string length.
> [1]: https://github.com/target/huntlib



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to