GitHub user cestella opened a pull request:
https://github.com/apache/incubator-metron/pull/89
METRON-119 Move PCAP infrastructure from HBase
As it stands, the existing approach to handling PCAP data has some issues
handling high volume packet capture data. With the advent of a DPDK plugin for
capturing packet data, we are going to hit some limitations on the throughput
of consumption if we continue to try to push packet data into HBase at
line-speed.
Furthermore, storing PCAP data into HBase limits the range of filter
queries that we can perform (i.e. only those expressible within the key). As
of now, we require all fields to be present (source IP/port, destination
IP/port and protocol), rather than allowing any wildcards.
To address these issues, we should create a higher performance topology
which attaches the appropriate header to the raw packet and timestamp read from
Kafka (as placed onto kafka by the packet capture sensor) and appends this
packet to a sequence file in HDFS. The sequence file will be rolled based on
number of packets or time (e.g. 1 hrs worth of packets in a given sequence
file).
On the query side, we should adjust the middle tier service layer to start
a MR job on the appropriate set of sequence files to filter out the appropriate
packets. NOTE: the UI modifications to make this reasonable for the end-user
will need to be done in a follow-on JIRA.
In order to test this PR, I would suggest doing the following as the "happy
path":
1. Install the pycapa library & utility via instructions
[here](https://github.com/apache/incubator-metron/tree/master/metron-sensors/pycapa)
2. (if using singlenode vagrant) Kill the enrichment and sensor topologies
via `for i in bro enrichment yaf snort;do storm kill $i;done`
3. Start the pcap topology via
`/usr/metron/0.1BETA/bin/start_pcap_topology.sh`
4. Start the pycapa packet capture producer on eth1 via `/usr/bin/pycapa
--producer --topic pcap -i eth1 -k node1:6667`
5. Watch the topology in the [Storm UI](http://node1:8744/index.html) and
kill the packet capture utility from before when the number of packets ingested
is over 1k.
6. Ensure that at at least 2 files exist on HDFS by running `hadoop fs -ls
/apps/metron/pcap`
7. Choose a file (denoted by $FILE) and dump a few of the contents using
the `pcap_inspector` utility via `/usr/metron/0.1BETA/bin/pcap_inspector.sh -i
$FILE -n 5`
8. Choose one of the lines and note the source ip/port and dest ip/port
9. Go to the kibana app at [http://node1:5000](http://node1:5000) on the
singlenode vagrant (ymmv on ec2) and input that query in the kibana PCAP panel.
10. Wait patiently while the MR job completes and the results are sent back
in the form of a valid PCAP payload suitable for opening in wireshark
11. Open in wireshark to ensure the payload is valid.
If the payload is not valid PCAP, then please look at the [job
history](http://node1:19888/jobhistory) and note the reason for job failure if
any.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cestella/incubator-metron
pcap_extraction_topology
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-metron/pull/89.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #89
----
commit fee3d0d327fccba464a62c75029cdc733a8d2a56
Author: cstella <[email protected]>
Date: 2016-02-26T14:41:37Z
Adding pcap infrastructure
commit d6da175e7b7072c585650d5604c7dd886177962c
Author: cstella <[email protected]>
Date: 2016-02-26T15:47:08Z
Updating kafka component to be more featureful.
commit 0a9dba8771939c28f65efd585fa501a4b0a4125b
Author: cstella <[email protected]>
Date: 2016-02-26T22:09:56Z
Updating topology and integration test.
commit 374b391c29621aac26fd9ef3ad54872f78a6e960
Author: cstella <[email protected]>
Date: 2016-03-01T15:00:58Z
Updating integration test.
commit 88d3d1572ca4b812c2438c8219434f2f19f1467d
Author: cstella <[email protected]>
Date: 2016-03-02T20:34:33Z
Fixed weird situation with HDFS, made the callback handle multiple
partitions, added licenses
commit d485bfa7415fbc8a5c34c3cf327468ce37b07847
Author: cstella <[email protected]>
Date: 2016-03-03T01:18:50Z
Updating topology.
commit 8a9706bf012c8041e3a8ea08e8924073cde887f1
Author: cstella <[email protected]>
Date: 2016-03-03T02:22:11Z
Merging can be fun, but this one was not. Merging in master with some
overlapping files from my feature branch that made their way into master via
another feature.
commit d99cb74892ac2624d77895368874de76edd274d8
Author: cstella <[email protected]>
Date: 2016-03-14T15:35:55Z
Merging from master.
commit 3f8daa693decc815c4c0328be9dc6994ae8a4310
Author: cstella <[email protected]>
Date: 2016-03-14T17:59:56Z
Updating component runner and integration test.
commit 86771b087d4ef38f87333be5027c4935fa79173e
Author: cstella <[email protected]>
Date: 2016-03-16T21:00:38Z
Integrating a proper integration test and service layer call.
commit 3cd17f1823b92661426bf21ea618c50cbb1ae2bf
Author: cstella <[email protected]>
Date: 2016-03-17T19:36:31Z
Updating integration test.
commit 52fb7b28163267d4e321a5becc1c4a8e73eff3ea
Author: cstella <[email protected]>
Date: 2016-03-18T13:05:25Z
Updating integration test.
commit 6f1e24f96f3fa96319337fda6385babee4ed2abb
Author: cstella <[email protected]>
Date: 2016-03-18T15:06:10Z
Updating classpath issues.
commit ae8a5c1f55de5daa467bae7d32977175efc5b4bb
Author: cstella <[email protected]>
Date: 2016-04-05T19:18:01Z
Merged master into feature branch.
commit 542ee9e19b9ef2c371f95a8143cad307f6a44347
Author: cstella <[email protected]>
Date: 2016-04-07T13:35:21Z
merged master in.
commit 3705c4719b73613c1d8f559672bbbbb31b14ff02
Author: cstella <[email protected]>
Date: 2016-04-07T17:37:03Z
Reverting some very bad things that I did.
commit c7f837704f17510ed3881066fd9b50a3ed889f2b
Author: cstella <[email protected]>
Date: 2016-04-07T21:42:47Z
Fixing spout config and integration test
commit b25cdaad2cf59f6448fbca368f2c5b0103750735
Author: cstella <[email protected]>
Date: 2016-04-08T14:35:01Z
Making this work with pycappa as well.
commit 182c151901de23b6d98435762276cd2802e685ba
Author: cstella <[email protected]>
Date: 2016-04-08T15:09:36Z
Updating integration test to work with timestamp in the key as well as
timestamp pulled from the data.
commit cc02302f8c4c55b380f3fbbf018ff21e74570819
Author: cstella <[email protected]>
Date: 2016-04-08T15:34:30Z
Moved around some stuff and realized I was not using unsigned comparisons.
commit e0d47a5aa94500b0954ae12449a270a5a2022830
Author: cstella <[email protected]>
Date: 2016-04-11T13:52:41Z
Headerizing in the converter.
commit 69f49959c470f1b73eb6d579661bcdc257c7010b
Author: cstella <[email protected]>
Date: 2016-04-11T13:56:42Z
Still have some weird serialization error, but will fix shortly.
commit f30595d151b823d23e1c8682343aafab6c45a30d
Author: cstella <[email protected]>
Date: 2016-04-11T20:10:14Z
Updating converters to implement serializable.
commit 09004e1f4566d4aad4ef349d6ddb013e1991c4b2
Author: cstella <[email protected]>
Date: 2016-04-19T12:45:10Z
Merge branch 'master' into pcap_extraction_topology
commit f52e57968b94591f0750659c3546403cd8d56e79
Author: cstella <[email protected]>
Date: 2016-04-19T21:01:34Z
Updating next gen pcap to include a notion of endianness that is
configurable.
commit bce86caf5047d9fbb42995b90d6e1d1842ee3cb2
Author: cstella <[email protected]>
Date: 2016-04-19T21:16:22Z
Added licenses.
commit f8dc3460c6678ba5c0e83e0d7cb21dce854810bc
Author: cstella <[email protected]>
Date: 2016-04-19T21:30:41Z
updated licenses and added a global_shade_version because the one in
Metron-Common was very old.
commit cb1288697de8da0cbfd6fc3b253ac3cbb40f698e
Author: cstella <[email protected]>
Date: 2016-04-20T12:54:56Z
Merge branch 'master' into pcap_extraction_topology
commit dfc3558496740d2429e755a7a23ca18943601e9f
Author: cstella <[email protected]>
Date: 2016-04-20T16:08:04Z
Moving stuff out of common.
commit f6e2567f21ef698531568593383ac732c7670a18
Author: cstella <[email protected]>
Date: 2016-04-20T20:17:39Z
We don't need to be configurable for the endianness..I can figure that out
from the JVM.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---