[ 
https://issues.apache.org/jira/browse/DRILL-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923365#comment-16923365
 ] 

ASF GitHub Bot commented on DRILL-7343:
---------------------------------------

arina-ielchiieva commented on pull request #1840: DRILL-7343: Add User-Agent 
UDFs to Drill
URL: https://github.com/apache/drill/pull/1840#discussion_r321226651
 
 

 ##########
 File path: contrib/udfs/README.md
 ##########
 @@ -0,0 +1,58 @@
+# Drill User Defined Functions
+
+This `README` documents functions which users have submitted to Apache Drill.  
+
+## User Agent Functions
+Drill UDF for parsing User Agent Strings.
+This function is based on Niels Basjes Java library for parsing user agent 
strings which is available here: https://github.com/nielsbasjes/yauaa.
+
+### Usage
+The function `parse_user_agent()` takes a user agent string as an argument and 
returns a map of the available fields.  Note that not every field will be 
present in every user agent string. 
+```
+SELECT parse_user_agent( columns[0] ) as ua 
+FROM dfs.`/tmp/data/drill-httpd/ua.csv`;
+```
+The query above returns:
+```
+{
+  "DeviceClass":"Desktop",
+  "DeviceName":"Macintosh",
+  "DeviceBrand":"Apple",
+  "OperatingSystemClass":"Desktop",
+  "OperatingSystemName":"Mac OS X",
+  "OperatingSystemVersion":"10.10.1",
+  "OperatingSystemNameVersion":"Mac OS X 10.10.1",
+  "LayoutEngineClass":"Browser",
+  "LayoutEngineName":"Blink",
+  "LayoutEngineVersion":"39.0",
+  "LayoutEngineVersionMajor":"39",
+  "LayoutEngineNameVersion":"Blink 39.0",
+  "LayoutEngineNameVersionMajor":"Blink 39",
+  "AgentClass":"Browser",
+  "AgentName":"Chrome",
+  "AgentVersion":"39.0.2171.99",
+  "AgentVersionMajor":"39",
+  "AgentNameVersion":"Chrome 39.0.2171.99",
+  "AgentNameVersionMajor":"Chrome 39",
+  "DeviceCpu":"Intel"
+}
+```
+The function returns a Drill map, so you can access any of the fields using 
Drill's table.map.key notation. For example, the query below illustrates how to 
extract a field from this map and summarize it:
+
+```
+SELECT uadata.ua.AgentNameVersion AS Browser,
+COUNT( * ) AS BrowserCount
+FROM (
+   SELECT parse_user_agent( columns[0] ) AS ua
+   FROM dfs.drillworkshop.`user-agents.csv`
+) AS uadata
+GROUP BY uadata.ua.AgentNameVersion
+ORDER BY BrowserCount DESC
+```
+The function can also be called with an optional field as an argument.  IE:
+```
+SELECT parse_user_agent( `user_agent`, 'AgentName` ) as AgentName ...
+```
+which will just return the requested field.  If the user agent string is 
empty, all fields will have the value of `Hacker`.  
 
 Review comment:
   ```suggestion
   which will just return the requested field. If the user agent string is 
empty, all fields will have the value of `Hacker`.  
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add User-Agent UDFs to Drill
> ----------------------------
>
>                 Key: DRILL-7343
>                 URL: https://issues.apache.org/jira/browse/DRILL-7343
>             Project: Apache Drill
>          Issue Type: New Feature
>    Affects Versions: 1.17.0
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.17.0
>
>
> This collection of UDFs adds the ability to parse user agent strings which is 
> useful for security data analysis. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to