GitHub user cestella opened a pull request:
https://github.com/apache/incubator-metron/pull/93
METRON-119 Move PCAP infrastructure from HBase
As it stands, the existing approach to handling PCAP data has some issues
handling high volume packet capture data. With the advent of a DPDK plugin for
capturing packet data, we are going to hit some limitations on the throughput
of consumption if we continue to try to push packet data into HBase at
line-speed.
Furthermore, storing PCAP data into HBase limits the range of filter
queries that we can perform (i.e. only those expressible within the key). As
of now, we require all fields to be present (source IP/port, destination
IP/port and protocol), rather than allowing any wildcards.
To address these issues, we should create a higher performance topology
which attaches the appropriate header to the raw packet and timestamp read from
Kafka (as placed onto kafka by the packet capture sensor) and appends this
packet to a sequence file in HDFS. The sequence file will be rolled based on
number of packets or time (e.g. 1 hrs worth of packets in a given sequence
file).
On the query side, we should adjust the middle tier service layer to start
a MR job on the appropriate set of sequence files to filter out the appropriate
packets. NOTE: the UI modifications to make this reasonable for the end-user
will need to be done in a follow-on JIRA.
In order to test this PR, I would suggest doing the following as the "happy
path":
1. Install the pycapa library & utility via instructions
[here](https://github.com/apache/incubator-metron/tree/master/metron-sensors/pycapa)
2. (if using singlenode vagrant) Kill the enrichment and sensor topologies
via `for i in bro enrichment yaf snort;do storm kill $i;done`
3. Start the pcap topology via
`/usr/metron/0.1BETA/bin/start_pcap_topology.sh`
4. Start the pycapa packet capture producer on eth1 via `/usr/bin/pycapa
--producer --topic pcap -i eth1 -k node1:6667`
5. Watch the topology in the [Storm UI](http://node1:8744/index.html) and
kill the packet capture utility from before when the number of packets ingested
is over 1k.
6. Ensure that at at least 2 files exist on HDFS by running `hadoop fs -ls
/apps/metron/pcap`
7. Choose a file (denoted by $FILE) and dump a few of the contents using
the `pcap_inspector` utility via `/usr/metron/0.1BETA/bin/pcap_inspector.sh -i
$FILE -n 5`
8. Choose one of the lines and note the source ip/port and dest ip/port
9. Go to the kibana app at [http://node1:5000](http://node1:5000) on the
singlenode vagrant (ymmv on ec2) and input that query in the kibana PCAP panel.
10. Wait patiently while the MR job completes and the results are sent back
in the form of a valid PCAP payload suitable for opening in wireshark
11. Open in wireshark to ensure the payload is valid.
If the payload is not valid PCAP, then please look at the [job
history](http://node1:19888/jobhistory) and note the reason for job failure if
any.
Also, please note changes and addition to the documentation for the [pcap
service](https://github.com/cestella/incubator-metron/tree/METRON-119/metron-streaming/metron-api)
and [pcap
backend](https://github.com/cestella/incubator-metron/tree/METRON-119/metron-platform/metron-pcap-backend).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cestella/incubator-metron METRON-119
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-metron/pull/93.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #93
----
commit e5062606519bda57eb7c1a739317e4f2011cddd1
Author: cstella <[email protected]>
Date: 2016-04-28T17:51:57Z
METRON-119 Move the PCAP topology from HBase
commit 99bf1632a7e5ed3d36137ec326626c0b0f84d4bf
Author: cstella <[email protected]>
Date: 2016-04-28T17:56:05Z
Updating the documentation.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---