[ https://issues.apache.org/jira/browse/DRILL-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597393#comment-17597393 ]
ASF GitHub Bot commented on DRILL-8289: --------------------------------------- pjfanning commented on code in PR #2634: URL: https://github.com/apache/drill/pull/2634#discussion_r957792365 ########## contrib/udfs/src/main/java/org/apache/drill/exec/udfs/ThreatHuntingFunctions.java: ########## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.udfs; + +import io.netty.buffer.DrillBuf; +import org.apache.drill.exec.expr.DrillSimpleFunc; +import org.apache.drill.exec.expr.annotations.FunctionTemplate; +import org.apache.drill.exec.expr.annotations.Output; +import org.apache.drill.exec.expr.annotations.Param; +import org.apache.drill.exec.expr.holders.Float8Holder; +import org.apache.drill.exec.expr.holders.VarCharHolder; + +import javax.inject.Inject; + +public class ThreatHuntingFunctions { + /** + * Punctuation pattern is useful for comparing log entries. It extracts the all the punctuation and returns + * that pattern. Spaces are replaced with an underscore. + * <p> + * Usage: SELECT punctuation_pattern( string ) FROM... + */ + @FunctionTemplate(names = {"punctuation_pattern", "punctuationPattern"}, + scope = FunctionTemplate.FunctionScope.SIMPLE, + nulls = FunctionTemplate.NullHandling.NULL_IF_NULL) + public static class PunctuationPatternFunction implements DrillSimpleFunc { + + @Param + VarCharHolder rawInput; + + @Output + VarCharHolder out; + + @Inject + DrillBuf buffer; + + @Override + public void setup() { + } + + @Override + public void eval() { + + String input = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(rawInput.start, rawInput.end, rawInput.buffer); + + String punctuationPattern = input.replaceAll("[a-zA-Z0-9]", ""); + punctuationPattern = punctuationPattern.replaceAll(" ", "_"); + + out.buffer = buffer; + out.start = 0; + out.end = punctuationPattern.getBytes().length; Review Comment: getBytes is safer if you specify a charset, otherwise you get the JVM default which differs from machine to machine (unless Drill startup shell scripts specify `-Dfile.encoding=...`) > Add Threat Hunting Functions > ---------------------------- > > Key: DRILL-8289 > URL: https://issues.apache.org/jira/browse/DRILL-8289 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill > Affects Versions: 2.0.0 > Reporter: Charles Givre > Assignee: Charles Givre > Priority: Major > Fix For: 2.0.0 > > > # Threat Hunting Functions > These functions are useful for doing threat hunting with Apache Drill. These > were inspired by huntlib.[1] > The functions are: > * `punctuation_pattern(<string>)`: Extracts the pattern of punctuation in > text. > * `entropy(<string>)`: This function calculates the Shannon Entropy of a > given string of text. > * `entropyPerByte(<string>)`: This function calculates the Shannon Entropy of > a given string of text, normed for the string length. > [1]: https://github.com/target/huntlib -- This message was sent by Atlassian Jira (v8.20.10#820010)