[ https://issues.apache.org/jira/browse/S2GRAPH-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16633418#comment-16633418 ]
ASF GitHub Bot commented on S2GRAPH-225: ---------------------------------------- Github user elric-k commented on the issue: https://github.com/apache/incubator-s2graph/pull/185 ### Grok UDF In addition, I created Grok Udf class that is using grok library for parsing some text. The Grok Udf have some parameters ``` patternDir : grok pattern file base directory patternFiles : grok pattern files compilePattern : pattern name to use schema: result data schema, if not exist it will return map type ``` For example, I added two type of udf for parsing s2graph application log message. Both udf uses a grok file stored in a specific path of hdfs, and differ only in the compile pattern. (each udf parses only the text that matches the compile pattern.) ``` udfs: [ { name: grok_s2app_expr class: org.apache.s2graph.s2jobs.udfs.Grok params: { patternDir: hdfs:///user/s2graph/grok_patterns patternFiles: patterns,s2logs compilePattern: %{S2GRAPH_EXPR} } } { name: grok_s2app_query class: org.apache.s2graph.s2jobs.udfs.Grok params: { patternDir: hdfs:///user/s2graph/grok_patterns patternFiles: patterns,s2logs compilePattern: %{S2GRAPH_QUERY} } } ] ``` You can use the name you defined when you declared udf in SQL, and you can query the results parsed by grok. ``` process: [ { name: grok inputs: [ app_log ] type: sql options: { sql: ''' SELECT m.log_level, m.req_path, m.experiment_name, m.res_time, m.status FROM ( SELECT *, COALESCE(grok_s2app_expr(message), grok_s2app_query(message)) AS m FROM app_log ) ''' } } ] ``` > support custom udf class > ------------------------ > > Key: S2GRAPH-225 > URL: https://issues.apache.org/jira/browse/S2GRAPH-225 > Project: S2Graph > Issue Type: Sub-task > Components: s2jobs > Reporter: Chul Kang > Assignee: Chul Kang > Priority: Minor > > We need to support custom UDFs that is available on SQL queries. > UDFs allow enabling new functions in SQL by abstracting their lower level > language implementations. > Spark SQL offers integrating custom udf easily. > I think it would be nice to provide an interface to register the custom UDFs > in the Job Description. -- This message was sent by Atlassian JIRA (v7.6.3#76005)