[ 
https://issues.apache.org/jira/browse/S2GRAPH-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16633418#comment-16633418
 ] 

ASF GitHub Bot commented on S2GRAPH-225:
----------------------------------------

Github user elric-k commented on the issue:

    https://github.com/apache/incubator-s2graph/pull/185
  
    ### Grok UDF
    In addition, I created Grok Udf class that is using grok library for 
parsing some text.
    The Grok Udf have some parameters
    ```
    patternDir : grok pattern file base directory
    patternFiles : grok pattern files
    compilePattern : pattern name to use
    schema: result data schema, if not exist it will return map type
    ```
    
    For example, I added two type of udf for parsing s2graph application log 
message.
    Both udf uses a grok file stored in a specific path of hdfs, and differ 
only in the compile pattern.
    (each udf parses only the text that matches the compile pattern.)
    ```
    udfs: [
          {
            name: grok_s2app_expr
            class: org.apache.s2graph.s2jobs.udfs.Grok
            params: {
              patternDir: hdfs:///user/s2graph/grok_patterns
              patternFiles: patterns,s2logs
              compilePattern: %{S2GRAPH_EXPR}
            }
          }
          {
            name: grok_s2app_query
            class: org.apache.s2graph.s2jobs.udfs.Grok
            params: {
              patternDir: hdfs:///user/s2graph/grok_patterns
              patternFiles: patterns,s2logs
              compilePattern: %{S2GRAPH_QUERY}
            }
          }
        ]
    ```
    
    You can use the name you defined when you declared udf in SQL, and you can 
query the results parsed by grok.
    ```
    process: [
          {
            name: grok
            inputs: [
              app_log
            ]
            type: sql
            options: {
              sql:
                '''
                SELECT
                  m.log_level,
                  m.req_path,
                  m.experiment_name,
                  m.res_time,
                  m.status
                FROM (
                  SELECT 
                    *,
                    COALESCE(grok_s2app_expr(message), 
grok_s2app_query(message)) AS m
                  FROM app_log
                )
                '''
            }
          }
        ]
    ```


> support custom udf class
> ------------------------
>
>                 Key: S2GRAPH-225
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-225
>             Project: S2Graph
>          Issue Type: Sub-task
>          Components: s2jobs
>            Reporter: Chul Kang
>            Assignee: Chul Kang
>            Priority: Minor
>
> We need to support custom UDFs that is available on SQL queries.
> UDFs allow enabling new functions in SQL by abstracting their lower level 
> language implementations. 
> Spark SQL offers integrating custom udf easily.
> I think it would be nice to provide an interface to register the custom UDFs 
> in the Job Description.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to