GitHub user merrimanr opened a pull request:
https://github.com/apache/metron/pull/1250
METRON-1850: Stellar REST function
## Contributor Comments
This PR adds a Stellar REST function that can be used to enrich messages
with data from 3rd party REST services. This function leverages the Apache
HttpComponents library for Http requests.
The function call follows this format: `REST_GET(rest uri, optional rest
settings)`
There are a handful of settings including basic authentication credentials,
proxy support, and timeouts. These settings are included in the `RestConfig`
class. Any of these settings can be defined in the global config under the
`stellar.rest.settings` property and will override any default values. The
global config settings can also be overridden by passing in a config object to
the expression. This will allow support for multiple REST services that may
have different requirements.
Responses are expected to be in JSON format. Successful requests will
return a MAP object in Stellar. Errors will be logged and null will be
returned by default. There are ways to override this behavior by configuring a
list of acceptable status codes and/or values to be returned on error or empty
responses.
Considering this will be used on streaming data, how we handle timeouts is
important. This function exposes HttpClient timeout settings but those are not
enough to keep unwanted latency from being introduced. A hard timeout is
implemented in the function to abort a request if the timeout is exceeded. The
idea is to guarantee the total request time will not exceed a configured value.
### Changes Included
- HttpClient capability added to Stellar
- HttpClient setup added to the various bolts and Stellar REPL
- Utility added for setting up a pooling HttpClient (and possibly other
types of clients in the future)
- Configuration mechanism added for Stellar REST function including
settings for authentication, proxy support, timeouts and other settings
- Function implementation and appropriate unit/integration tests (both unit
tests and integration tests are included)
### Testing
There are several different ways to test this feature and I would encourage
reviewers to get creative and look for cases I may not have thought of. For my
testing, I used an online Http service that provides simple endpoints for
simulating different use cases: `http://httpbin.org/#/`. Feel free to try
your own or use this one.
I tested this in full dev using the Stellar REPL and the parser and
enrichment topologies. First you need to perform a couple setup steps:
1. Spin up full dev and ensure everything comes up and data is being indexed
2. Ssh to full dev and install the Squid proxy server:
```
yum -y install squid
```
3. Create a password file that Squid can use for basic authentication
```
yum -y install httpd-tools
touch /etc/squid/passwd && chown squid /etc/squid/passwd
htpasswd /etc/squid/passwd user # (Will prompt for a password)
```
4. Configure Squid for basic authentication by adding these lines to
`/etc/squid/squid.conf`, under the lines with `acl Safe_ports*`:
```
auth_param basic program /usr/lib64/squid/ncsa_auth /etc/squid/passwd
auth_param basic children 5
auth_param basic realm Squid Basic Authentication
auth_param basic credentialsttl 2 hours
acl auth_users proxy_auth REQUIRED
http_access allow auth_users
```
5. Start Squid and verify it is working correctly:
```
service squid restart
curl --proxy-user user:password -x node1:3128 http://www.google.com/
```
6. Next create password files in HDFS:
```
su hdfs
cd ~
echo passwd > basicPassword.txt
hdfs dfs -put basicPassword.txt /apps/metron
echo password > proxyPassword.txt
hdfs dfs -put proxyPassword.txt /apps/metron
exit
```
To test with the Stellar REPL, follow these steps:
1. Start the Stellar REPL and verify the `REST_GET` function is available:
```
/usr/metron/0.6.1/bin/stellar --zookeeper node1:2181
[Stellar]>>> %functions REST
REST_GET
```
2. Test a simple get request:
```
[Stellar]>>> REST_GET('http://httpbin.org/get')
{args={}, headers={Accept=application/json, Accept-Encoding=gzip,deflate,
Cache-Control=max-age=259200, Connection=close, Host=httpbin.org,
User-Agent=Apache-HttpClient/4.3.2 (java 1.5)}, origin=127.0.0.1,
136.62.241.236, url=http://httpbin.org/get}
```
3. Test a get request with basic authentication:
```
[Stellar]>>> config :=
{'basic.auth.user':'user','basic.auth.password.path':'/apps/metron/basicPassword.txt'}
{basic.auth.user=user,
basic.auth.password.path=/apps/metron/basicPassword.txt}
[Stellar]>>> REST_GET('http://httpbin.org/basic-auth/user/passwd', config)
{authenticated=true, user=user}
```
4. Try the same request without passing in the config. You should get an
authentication error:
```
[Stellar]>>> REST_GET('http://httpbin.org/basic-auth/user/passwd')
2018-10-28 00:32:20 ERROR RestFunctions:161 - Stellar REST request to
http://httpbin.org/basic-auth/user/passwd expected status code to be one of
[200] but failed with http status code 401:
java.io.IOException: Stellar REST request to
http://httpbin.org/basic-auth/user/passwd expected status code to be one of
[200] but failed with http status code 401:
at
org.apache.metron.stellar.dsl.functions.RestFunctions$RestGet.doGet(RestFunctions.java:209)
at
org.apache.metron.stellar.dsl.functions.RestFunctions$RestGet.apply(RestFunctions.java:157)
at
org.apache.metron.stellar.common.StellarCompiler.lambda$exitTransformationFunc$13(StellarCompiler.java:652)
at
org.apache.metron.stellar.common.StellarCompiler$Expression.apply(StellarCompiler.java:250)
at
org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:151)
at
org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.executeStellar(DefaultStellarShellExecutor.java:409)
at
org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.execute(DefaultStellarShellExecutor.java:260)
at
org.apache.metron.stellar.common.shell.cli.StellarShell.execute(StellarShell.java:357)
at org.jboss.aesh.console.AeshProcess.run(AeshProcess.java:53)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```
5. You should also be able to set the basic authentication settings through
the global config:
```
[Stellar]>>> %define stellar.rest.settings :=
{'basic.auth.user':'user','basic.auth.password.path':'/apps/metron/basicPassword.txt'}
{basic.auth.user=user,
basic.auth.password.path=/apps/metron/basicPassword.txt}
[Stellar]>>> REST_GET('http://httpbin.org/basic-auth/user/passwd')
{authenticated=true, user=user}
```
6. Now verify you can send a request through the proxy:
```
[Stellar]>>> config :=
{'proxy.host':'node1','proxy.port':3128,'proxy.basic.auth.user':'user','proxy.basic.auth.password.path':'/apps/metron/proxyPassword.txt'}
{proxy.basic.auth.password.path=/apps/metron/proxyPassword.txt,
proxy.port=3128, proxy.host=node1, proxy.basic.auth.user=user}
[Stellar]>>> REST_GET('http://httpbin.org/get', config)
{args={}, headers={Accept=application/json, Accept-Encoding=gzip,deflate,
Cache-Control=max-age=259200, Connection=close, Host=httpbin.org,
User-Agent=Apache-HttpClient/4.3.2 (java 1.5)}, origin=127.0.0.1,
136.62.241.236, url=http://httpbin.org/get}
```
7. Leave out the proxy credentials, you should get a proxy error:
```
[Stellar]>>> config := {'proxy.host':'node1','proxy.port':3128}
{proxy.port=3128, proxy.host=node1}
[Stellar]>>> REST_GET('http://httpbin.org/get', config)
2018-10-28 00:43:48 ERROR RestFunctions:161 - Stellar REST request to
http://httpbin.org/get expected status code to be one of [200] but failed with
http status code 407: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ERROR: Cache Access Denied</title>
```
8. Timeout is 1000 milliseconds by default. Test the timeout by setting it
to 1 or a value where a request won't finish in time. You should get an error:
```
[Stellar]>>> REST_GET('http://httpbin.org/get', config)
2018-10-28 00:53:07 ERROR RestFunctions:161 - Total Stellar REST request
time to http://httpbin.org/get exceeded the configured timeout of 1 ms.
java.io.IOException: Total Stellar REST request time to
http://httpbin.org/get exceeded the configured timeout of 1 ms.
at
org.apache.metron.stellar.dsl.functions.RestFunctions$RestGet.doGet(RestFunctions.java:188)
at
org.apache.metron.stellar.dsl.functions.RestFunctions$RestGet.apply(RestFunctions.java:157)
at
org.apache.metron.stellar.common.StellarCompiler.lambda$exitTransformationFunc$13(StellarCompiler.java:652)
at
org.apache.metron.stellar.common.StellarCompiler$Expression.apply(StellarCompiler.java:250)
at
org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:151)
at
org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.executeStellar(DefaultStellarShellExecutor.java:409)
at
org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.execute(DefaultStellarShellExecutor.java:260)
at
org.apache.metron.stellar.common.shell.cli.StellarShell.execute(StellarShell.java:357)
at org.jboss.aesh.console.AeshProcess.run(AeshProcess.java:53)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```
9. You can also configure which status codes should be handled as errors.
A 404 is considered an error by default:
```
[Stellar]>>> REST_GET('http://httpbin.org/status/404', config)
2018-10-28 00:56:08 ERROR RestFunctions:161 - Stellar REST request to
http://httpbin.org/status/404 expected status code to be one of [200] but
failed with http status code 404:
java.io.IOException: Stellar REST request to http://httpbin.org/status/404
expected status code to be one of [200] but failed with http status code 404:
at
org.apache.metron.stellar.dsl.functions.RestFunctions$RestGet.doGet(RestFunctions.java:209)
at
org.apache.metron.stellar.dsl.functions.RestFunctions$RestGet.apply(RestFunctions.java:157)
at
org.apache.metron.stellar.common.StellarCompiler.lambda$exitTransformationFunc$13(StellarCompiler.java:652)
at
org.apache.metron.stellar.common.StellarCompiler$Expression.apply(StellarCompiler.java:250)
at
org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:151)
at
org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.executeStellar(DefaultStellarShellExecutor.java:409)
at
org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.execute(DefaultStellarShellExecutor.java:260)
at
org.apache.metron.stellar.common.shell.cli.StellarShell.execute(StellarShell.java:357)
at org.jboss.aesh.console.AeshProcess.run(AeshProcess.java:53)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```
This behavior can be changed by configuring a 404 to be an acceptable
status code and returning an empty object instead of null:
```
{response.codes.allowed=[200, 404], empty.content.override={}}
[Stellar]>>> REST_GET('http://httpbin.org/status/404', config)
{}
```
10. The value returned on an error can also be changed from null:
```
config := {'error.value.override':'got an error'}
[Stellar]>>> result := REST_GET('http://httpbin.org/status/500', config)
2018-10-28 00:59:41 ERROR RestFunctions:161 - Stellar REST request to
http://httpbin.org/status/500 expected status code to be one of [200] but
failed with http status code 500:
java.io.IOException: Stellar REST request to http://httpbin.org/status/500
expected status code to be one of [200] but failed with http status code 500:
at
org.apache.metron.stellar.dsl.functions.RestFunctions$RestGet.doGet(RestFunctions.java:209)
at
org.apache.metron.stellar.dsl.functions.RestFunctions$RestGet.apply(RestFunctions.java:157)
at
org.apache.metron.stellar.common.StellarCompiler.lambda$exitTransformationFunc$13(StellarCompiler.java:652)
at
org.apache.metron.stellar.common.StellarCompiler$Expression.apply(StellarCompiler.java:250)
at
org.apache.metron.stellar.common.BaseStellarProcessor.parse(BaseStellarProcessor.java:151)
at
org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.executeStellar(DefaultStellarShellExecutor.java:409)
at
org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.execute(DefaultStellarShellExecutor.java:260)
at
org.apache.metron.stellar.common.shell.specials.AssignmentCommand.execute(AssignmentCommand.java:66)
at
org.apache.metron.stellar.common.shell.DefaultStellarShellExecutor.execute(DefaultStellarShellExecutor.java:255)
at
org.apache.metron.stellar.common.shell.cli.StellarShell.execute(StellarShell.java:357)
at org.jboss.aesh.console.AeshProcess.run(AeshProcess.java:53)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
got an error
```
To test with the parser and enrichment topologies follow these steps:
1. Make sure the topologies are running and data is flowing through. It's
easier to test if you restart the parser topology with only a single sensor
running.
2. Add a Stellar field transformation to the parser that is still running:
```
"fieldTransformations": [
{
"input": [],
"output": [
"parser_rest_result"
],
"transformation": "STELLAR",
"config": {
"parser_rest_result":
"REST_GET('http://httpbin.org/get?type=parser')"
}
}
],
```
3. Listen on the enrichments Kafka topic. The `parser_rest_result` field
should now be present.
4. Add a Stellar enrichment to the sensor:
```
"fieldMap": {
"geo": [
"ip_dst_addr",
"ip_src_addr"
],
"host": [
"host"
],
"stellar": {
"config": {
"enrichment_rest_result":
"REST_GET('http://httpbin.org/get?type=enrichment')"
}
}
}
```
5. Listen on the indexing Kafka topic. The `enrichment_rest_result` field
should now be present.
### Outstanding Issues
- Currently the Stellar REPL does not quit cleanly. I suspect it's because
the client is not closed but I'm still investigating.
- I would like to add a section with a more detailed description of how to
use this including explanation around what the various settings do. Where
should this go? I don't see anything like this in the stellar-common README so
wanted to get some guidance from the community.
- Caching was briefly discussed in the discuss thread for this feature.
Stellar provides a caching mechanism but we may need to be more selective about
what is cached right now. I believe this should be a follow on.
- There was a comment in the Jira related to adding a circuit breaker.
Does that need to be done in this PR or can it be a follow on? Should we also
explore/discuss a retry strategy?
- It was also suggested that we create an abstraction for higher latency
enrichments such as this in the discuss thread. I would prefer we create a few
of these higher latency functions first so that we have a better understanding
of how this abstraction would look. Do we want to take that on here?
## Pull Request Checklist
Thank you for submitting a contribution to Apache Metron.
Please refer to our [Development
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
for the complete guide to follow for contributions.
Please refer also to our [Build Verification
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow
these guidelines and ask you to double check the following:
### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to
be created at [Metron
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA
number you are trying to resolve? Pay particular attention to the hyphen "-"
character.
- [x] Has your PR been rebased against the latest commit within the target
branch (typically master)?
### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been
executed in the root metron folder via:
```
mvn -q clean integration-test install &&
dev-utilities/build-utils/verify_licenses.sh
```
- [x] Have you written or updated unit tests and or integration tests to
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building
and running locally with Vagrant full-dev environment or the equivalent?
### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in
which it is rendered by building and verifying the site-book? If not then run
the following commands and the verify changes via
`site-book/target/site/index.html`:
```
cd site-book
mvn site
```
#### Note:
Please ensure that once the PR is submitted, you check travis-ci for build
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up
for your personal repository such that your branches are built there before
submitting a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/merrimanr/incubator-metron METRON-1850
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/metron/pull/1250.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1250
----
commit acbfdda2e4dfd5034bea19e6c7d18acc2c6a1e17
Author: merrimanr <merrimanr@...>
Date: 2018-10-31T13:19:34Z
initial commit
commit 1287d7f674c4197167f9237cf3d6749e77936230
Author: merrimanr <merrimanr@...>
Date: 2018-10-31T18:56:04Z
expression config should be a map and not a string
----
---