Re: [Ntop-misc] General questions and documentation of nprobe internals

Simone Mainardi Thu, 21 Dec 2017 08:17:39 -0800

Mark,


> On 20 Dec 2017, at 13:25, Mark Petronic <[email protected]> wrote:
> 
> I am running with nprobe 8.2 in collector mode. I am currently designing a 
> collection infrastructure so I want to try to understand what nprobe is doing 
> internally as to better understand how data is being processed. I a number of 
> questions in regard to this. I have read the latest version of the user guide 
> PDF but still have some questions. I tried to organize my questions in blocks 
> to hopefully allow for easier commenting on each question. This is fairly 
> long but I figured asking this all together, in context, would be better. 
> Thanks in advance to whoever takes this on - I really appreciate it. :)
> 
> Is there any detailed documentation on what is going on internally with 
> nprobe. In particular, I am using it as a collector to forward UDP netflow v9 
> from our Cisco routers to Kafka. I am particularly interesting in 
> understanding some of these stats and what they "infer" is happening under 
> the hood:
> 
> 19/Dec/2017 13:36:09 [nprobe.c:3202] Average traffic: [0.00 pps][All Traffic 
> 0 b/sec][IP Traffic 0 b/sec][ratio -nan]
> 19/Dec/2017 13:36:09 [nprobe.c:3210] Current traffic: [0.00 pps][0 b/sec]
> 19/Dec/2017 13:36:09 [nprobe.c:3216] Current flow export rate: [1818.5 
> flows/sec]
> 19/Dec/2017 13:36:09 [nprobe.c:3219] Flow drops: [export queue too 
> long=0][too many flows=0][ELK queue flow drops=0]
> 19/Dec/2017 13:36:09 [nprobe.c:3224] Export Queue: 0/512000 [0.0 %]
> 19/Dec/2017 13:36:09 [nprobe.c:3229] Flow Buckets: 
> [active=92792][allocated=92792][toBeExported=0]
> 19/Dec/2017 13:36:09 [nprobe.c:3235] Kafka [flows exported=366299/1818.5 
> flows/sec][msgs sent=366299/1.0 flows/msg][send errors=0]
> 19/Dec/2017 13:36:09 [nprobe.c:3260] Collector Threads: [167203 pkts@0] 
> 19/Dec/2017 13:36:09 [nprobe.c:3052] Processed packets: 0 (max bucket search: 
> 8)
> 19/Dec/2017 13:36:09 [nprobe.c:3035] Fragment queue length: 0
> 19/Dec/2017 13:36:09 [nprobe.c:3061] Flow export stats: [0 bytes/0 pkts][0 
> flows/0 pkts sent]
> 19/Dec/2017 13:36:09 [nprobe.c:3068] Flow collection:   [collected pkts: 
> 167203][processed flows: 4561802]
> 19/Dec/2017 13:36:09 [nprobe.c:3071] Flow drop stats:   [0 bytes/0 pkts][0 
> flows]
> 19/Dec/2017 13:36:09 [nprobe.c:3076] Total flow stats:  [0 bytes/0 pkts][0 
> flows/0 pkts sent]
> 19/Dec/2017 13:36:09 [nprobe.c:3087] Kafka [flows exported=366299][msgs 
> sent=366299/1.0 flows/msg][send errors=0]
> 
> For these two stats:
> 
> Flow collection:   [collected pkts: 167203][processed flows: 4561802]
> Kafka [flows exported=366299][msgs sent=366299/1.0 flows/msg][send errors=0]
> 
> I am thinking they mean that 167203 UDP packets where received from routers 
> comprising a total of 4561802 individual flow records. However, is see only 
> 366299 flows exported to Kafka. So, am I correct in assuming that nprobe is 
> doing some internal aggregation of flow records that is essentially squashing 
> the 4561802 received flow records into 366299 aggregates?

Yes it does some internal aggregations based on the 5-tuple and the timeouts 
configured (see. --lifetime-timeout and --idle-timeout).

If you want to disable any internal aggregation use option --disable-cache

> 
> A follow on question to this, then, is related to:
> 
> Flow Buckets: [active=92792][allocated=92792][toBeExported=0]
> 
> What are these and how are they utilized? Again, I am assuming these are hash 
> buckets used for internal aggregation per the user's guide.

Correct

> I have seen warning indicating that the allotment of these buckets are too 
> small and to expect drops. So, my guess is, based on flows/sec ingested, 
> these have to be sized appropriately to support the flow volume. Is that a 
> correct assumption? 

Correct. Increase it with --hash-size if necessary.

> 
> I also notice that, when I start up nprobe in collector mode publishing to 
> Kafka, it takes about 30 or more seconds before any flows actually are 
> published to Kafka. This leads me to believe internal aggregations are 
> occurring that are delaying publishing of data.

Yes, this depend on the timeouts configured so you should use -disable-cache if 
you want nProbe to simply act as a transparent proxy

> If I crank up the --verbose to 2, I can see UDP packets being processed and 
> then, after some time, I start to see log messages indicating flows are being 
> exported to Kafka. It is not as much the latency issue I am concerned with 
> here but rather just understanding what is happening so that I can properly 
> monitor and configure/size the system.
> 
> Do these parameters impact the utilization of the flow buckets in collector 
> mode or just when running in sniffer mode? I ask because, I know the routers 
> are already doing aggregations meaning, accumulation counts for flows over 
> time before emitting a flow record that is active. Does it mean that nprobe 
> is then doing the same thing again for these flows and essentially 
> aggregating already aggregated flow records coming from my routers?
> 
> [--lifetime-timeout|-t] <timeout>   It specifies the maximum (seconds) flow 
> lifetime [default=120]
> [--idle-timeout|-d] <timeout>       It specifies the maximum (seconds) flow 
> idle lifetime [default=30]
> [--queue-timeout|-l] <timeout>      It specifies how long expired flows 
> (queued before delivery) are emitted [default=30]
> 
> 
> Also, based on the assumption of aggregating already aggregated data and the 
> type of traffic on the network I am monitoring (lots of short-lived 
> transactions, like credit card swipe processing by vendors and DNS lookups), 
> does it even make sense to have nprobe aggregating this traffic that I know 
> is NOT going to consist of more than one flow record anyway?
> 
> The user document does not mention anything about monitoring nprobe 
> programmatically. What is the best way to monitor nprobe for internal packet 
> drops? I can get various OS stats from /proc/xxx, like UDP queue size, drops, 
> etc, but I need nprobe internal stats to round out the picture. I see that 
> there is information like this on stdout:
> 
> Flow drops: [export queue too long=0][too many flows=0][ELK queue flow 
> drops=0]

use -b=1 to have traffic stats periodically output to the terminal or log file.

> 
> However, I want to monitor my nprobe instances with Nagios and generate 
> alerts on threshold checks as well as track utilization over time by posting 
> periodic stats to our InfluxDB/Grafana setup. Is there some way (other than 
> parsing stdout in a log) to gain programmatic access to these stats for 
> monitoring tools to use?
> 
> Regarding Kafka, the producer has many configuration options but only very 
> few are exposed for configuration in nprobe. Let me ask these one by one:
> 
> batch.size, linger.ms <http://linger.ms/>, buffer.memory - These are 
> essential to controlling batching in Kafka. nprobe has options 
> --kafka-enable-batch and--kafka-batch-len. However, these end up wrapping N 
> messages into a JSON array of size N and publishing that to Kafka. I feel 
> this is a wrong approach. Consider the downstream Kafka consumer. It expects 
> to receive a series of message off a topic. The format of those message 
> should not change due to batching. When batching is not enabled in nprobe, 
> the consumer sees a series of JSON dictionaries - each a single flow record. 
> When batching is enabled, the consumer now sees a series of JSON arrays, each 
> with N JSON dictionaries. IMO, the proper way to do this is to use the Kafka 
> configuration values to control batching. In that case, the producer simply 
> queues up messages (each a dictionary) and, when configured thresholds are 
> met, emits those messages. This results in a batch of dictionaries being sent 
> and the consumer ONLY sees dictionaries. Changing the message structure due 
> to batching complicates things for consumers and is not a typical pattern in 
> Kafka processing.
Our experiments proved a 4x speedup with the implemented batching.
> Options topic - Your documentation does not even mention this (nprobe --help 
> does) but I don't understand what it means? What is a Kafka options topic?
It contains messages for the netflow option templates
> Partitioning - If we want to perform stream process of netflow data, then we 
> want to ensure that all flow records from a given n-tuple are placed on the 
> same Kafka partition. We need to partition the data because it is the only 
> way to scale consumers in Kafka. If I want to perform some aggregations on 
> the data stream then I have to be sure that all netflow records for a given 
> conversation, for example, are on the same topic partition. A simple example 
> that will make that happen would be to use the IPV4_SRC_ADDR field of the 
> flow record as the partition key. Or, maybe an N-tuple of (IPV4_SRC_ADDR, 
> IPV4_DST_ADDR, L4_SRC_PORT, L4_SRC_PORT) as the partition key. In Java, a 
> producer would do this by hashing the string that comprises the partition key 
> desired then doing a hash % num-partitions to figure out the partition to 
> send the message on. I am guessing that nprobe relies on the default 
> partitioning scheme in the producer which is a simple round-robin approach 
> based on the number of partitions that exist for the topic being used. This, 
> however, would randomly distribute flow records for a given conversation 
> across multiple partitions and, therefore across multiple consumers in a 
> downstream consumer group. That would break the aggregations. So, my request 
> is that you consider allowing a configuration option that enables the user to 
> define the partition key. This might be done, for example, by allowing the 
> user to define a CSV list of template fields to use to form the partition key 
> string. You could just concatenate them together and hash that value then 
> modulo divide by the number of partitions for the topic being used and use 
> that to enable the producer to publish on the appropriate topic partition. 
> The gives the user the freedom to define the partition key while making the 
> implemention in nprobe fairly generic. Maybe this could also be done via some 
> sort of "partition plugin" to make it even more extensible? How you 
> considered any such capability. Without such a capability, we will have to 
> initially publish all flows on a say "netflow-raw" topic (using round-robin) 
> then consume this topic in a consumer group only to republish it by 
> repartitioning it (as described above using some N-tuple of fields) only be 
> then be consumer by another consumer group who will be doing the aggregations 
> and enrichments needed. Sure, we can make it would but partitioning should 
> "really" be done at the source. The approach I just described necessarily 
> doubles our broker traffic which I would not like to have to do.
I see. Being able to control which data ends up into which shard will gives a 
lot more flexibility. Currently this feature is not implemented but if you 
contact us privately we can try and work together toward a more controllable 
hashing of the flows for kafka.


Simone
> Producer Options in General - Why not just make them all configurable? For 
> example, allow the user to define a name=value config file using any 
> supported producer configuration options and provide the path to the file as 
> an nprobe Kafka configuration option. Then, when you instantiate the producer 
> in nprobe, read in those configuration values and pass them into the 
> producer. This gives the users access to all options available and not just 
> the current topic, acks, and compression values.
> 
> Miscellaneous Notes:
> The v8.1 users guide lists "New Options --kafka-enable-batch 
> and--kafka-batch-len to batch flow export to kafka" but does not provide any 
> detailed documentation on these. Looks like someone forgot to add the 
> description of these later in the document
> nprobe --help show this under the Kafka options:  "<options topic> Flow 
> options topic" but the v8.1 user's guide gives no mention to it. I have no 
> idea what an options topic is.
> _______________________________________________
> Ntop-misc mailing list
> [email protected]
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
[email protected]
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: [Ntop-misc] General questions and documentation of nprobe internals

Reply via email to